net: ethernet: mediatek: fix runtime warning raised by inconsistent struct device pointers passed to DMA API
Runtime warning occurs if DMA-API debug feature is enabled that would be
raised by pointers passed to DMA API as arguments to inconsistent struct
device objects, so that the patch makes them usage aligned between DMA
operations such as dma_map_*() and dma_unmap_*() to eliminate the warning.
net: ethernet: mediatek: fix flow control settings on GMAC0 is not being enabled properly
Commit 08ef55c6f257acf3bdc6940813f80e8f0f5d90ec
("net-next: mediatek: fix gigabit and flow control advertisement")
had supported proper flow control settings for GMAC1. But for GMAC0,
1.GMAC0 shares the common logic with GMAC1 inside mtk_phy_link_adjust()
to adapt various settings for the target phy.
2.GMAC0 uses fixed-phy to connect to a builtin gigabit switch with
fixed link speed as commit 0c72c50f6f93b0c3daa9ea35d89ab3a933c7b5a0
("net-next: mediatek: add fixed-phy support") describes.
3.However, fixed-phy doesn't enable SUPPORTED_Pause & SUPPORTED_Asym_Pause
supported flag on default that would cause mtk_phy_link_adjust() not to
enable flow control setting on GMAC0 properly and cause packet dropped
when high traffic.
Due to these reasons, the patch adds SUPPORTED_Pause & SUPPORTED_Asym_Pause
supported flags on fixed-phy used by the driver to have proper handling on
the both GMAC with the shared common logic.
net: ethernet: mediatek: fix RMII mode and add REVMII supported by GMAC
The patch fixes up the incorrect setup of reduced MII (RMII) on GMAC
and adds the supplement for the setup of reverse MII (REVMII) on GMAC
, and rearranges the error handling for invalid PHY argument.
I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure
userspace is equipped to handle that. Anyway, this is better than a GPF
and looks somewhat consistent with other tipc_msg_create() callers.
David S. Miller [Mon, 15 Aug 2016 20:48:08 +0000 (13:48 -0700)]
Merge branch 'hv_netvsc-VF-removal-fixes'
Vitaly Kuznetsov says:
====================
hv_netvsc: fixes for VF removal path
Kernel crash is reported after VF is removed and detached from netvsc
device. Turns out we have multiple different (but related) issues on the
VF removal path which I'm trying to address with PATCHes 2-5 of this
series. PATCH1 is required to support the change.
Changes since v1:
- Re-arrange patches in the series to not introduce new issues [David Miller]
- Add PATCH5 which fixes a new issue I discovered while testing.
- Add Haiyang' A-b tags to PATCH1-4
With regards to Stephen's suggestion: I believe that switching to using RCU
and eliminating vf_use_cnt/vf_inject is the right thing to do long-term, we
can either put this on top of this series or do it later in net-next.
====================
Vitaly Kuznetsov [Mon, 15 Aug 2016 15:48:43 +0000 (17:48 +0200)]
hv_netvsc: fix bonding devices check in netvsc_netdev_event()
Bonding driver sets IFF_BONDING on both master (the bonding device) and
slave (the real NIC) devices and in netvsc_netdev_event() we want to skip
master devices only. Currently, there is an uncertainty when a slave
interface is removed: if bonding module comes first in netdev_chain it
clears IFF_BONDING flag on the netdev and netvsc_netdev_event() correctly
handles NETDEV_UNREGISTER event, but in case netvsc comes first on the
chain it sees the device with IFF_BONDING still attached and skips it. As
we still hold vf_netdev pointer to the device we crash on the next inject.
Vitaly Kuznetsov [Mon, 15 Aug 2016 15:48:42 +0000 (17:48 +0200)]
hv_netvsc: protect module refcount by checking net_device_ctx->vf_netdev
We're not guaranteed to see NETDEV_REGISTER/NETDEV_UNREGISTER notifications
only once per VF but we increase/decrease module refcount unconditionally.
Check vf_netdev to make sure we don't take/release it twice. We presume
that only one VF per netvsc device may exist.
Vitaly Kuznetsov [Mon, 15 Aug 2016 15:48:41 +0000 (17:48 +0200)]
hv_netvsc: reset vf_inject on VF removal
We reset vf_inject on VF going down (netvsc_vf_down()) but we don't on
VF removal (netvsc_unregister_vf()) so vf_inject stays 'true' while
vf_netdev is already NULL and we're trying to inject packets into NULL
net device in netvsc_recv_callback() causing kernel to crash.
Vitaly Kuznetsov [Mon, 15 Aug 2016 15:48:40 +0000 (17:48 +0200)]
hv_netvsc: avoid deadlocks between rtnl lock and vf_use_cnt wait
Here is a deadlock scenario:
- netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
- netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
is being executed from netdev notifier chain we hold rtnl lock when we
get here.
- we enter while (atomic_read(&net_device_ctx->vf_use_cnt) != 0) loop and
wait till netvsc_notify_peers() drops vf_use_cnt.
- netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
will hang on rtnl_lock().
- deadlock!
Instead of introducing additional synchronization I suggest we drop
gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
acting under rtnl lock this is legitimate.
Vitaly Kuznetsov [Mon, 15 Aug 2016 15:48:39 +0000 (17:48 +0200)]
hv_netvsc: don't lose VF information
struct netvsc_device is not suitable for storing VF information as this
structure is being destroyed on MTU change / set channel operation (see
rndis_filter_device_remove()). Move all VF related stuff to struct
net_device_context which is persistent.
Simon Horman [Mon, 15 Aug 2016 11:06:24 +0000 (13:06 +0200)]
gre: set inner_protocol on xmit
Ensure that the inner_protocol is set on transmit so that GSO segmentation,
which relies on that field, works correctly.
This is achieved by setting the inner_protocol in gre_build_header rather
than each caller of that function. It ensures that the inner_protocol is
set when gre_fb_xmit() is used to transmit GRE which was not previously the
case.
I have observed this is not the case when OvS transmits GRE using
lwtunnel metadata (which it always does).
Linus Torvalds [Mon, 15 Aug 2016 19:36:31 +0000 (12:36 -0700)]
Merge tag 'iommu-fixes-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- Some functions defined in a header file for the mediatek driver were
not marked inline. Fix that oversight.
- Fix a potential crash in the ARM64 dma-mapping code when freeing a
partially initialized domain.
- Another fix for ARM64 dma-mapping to respect IOMMU mapping
constraints when allocating IOVA addresses.
* tag 'iommu-fixes-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/dma: Respect IOMMU aperture when allocating
iommu/dma: Don't put uninitialised IOVA domains
iommu/mediatek: Mark static functions in headers inline
Lorenzo Colitti [Fri, 12 Aug 2016 16:13:38 +0000 (01:13 +0900)]
net: ipv6: Fix ping to link-local addresses.
ping_v6_sendmsg does not set flowi6_oif in response to
sin6_scope_id or sk_bound_dev_if, so it is not possible to use
these APIs to ping an IPv6 address on a different interface.
Instead, it sets flowi6_iif, which is incorrect but harmless.
Stop setting flowi6_iif, and support various ways of setting oif
in the same priority order used by udpv6_sendmsg.
Tested: https://android-review.googlesource.com/#/c/254470/ Signed-off-by: Lorenzo Colitti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
roundup_pow_of_two() is undefined when called with an argument of 0, so
let's avoid the call and just fall back to ht->p.min_size (which should
never be smaller than HASH_MIN_SIZE).
Vincent [Sun, 14 Aug 2016 13:38:29 +0000 (15:38 +0200)]
mlxsw: spectrum_router: Fix use after free
In mlxsw_sp_router_fib4_add_info_destroy(), the fib_entry pointer is used
after it has been freed by mlxsw_sp_fib_entry_destroy(). Use a temporary
variable to fix this.
The failure happens when allocating the spinlock array.
Even with GFP_KERNEL its unlikely for such a large allocation
to succeed.
Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
to adding NOWARN for atomic allocations this also makes the bucket-array
sizing more conservative.
In commit 095dc8e0c3686 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
IOW, consider size needed by a single spinlock when determining
number of locks per cpu. So with 64 byte per cacheline and 4 byte per
spinlock this gives 32 locks per cpu.
Resulting size of the lock-array (sizeof(spinlock) == 4):
Linus Torvalds [Mon, 15 Aug 2016 02:01:31 +0000 (19:01 -0700)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal updates from Zhang Rui:
- Fix a race condition when updating cooling device, which may lead to
a situation where a thermal governor never updates the cooling
device. From Michele Di Giorgio.
- Fix a zero division error when disabling the forced idle injection
from the intel powerclamp. From Petr Mladek.
- Add suspend/resume callback for intel_pch_thermal thermal driver.
From Srinivas Pandruvada.
- Another two fixes for clocking cooling driver and hwmon sysfs I/F.
From Michele Di Giorgio and Kuninori Morimoto.
[ Hmm. That suspend/resume callback for intel_pch_thermal doesn't look
like a fix, but I'm letting it slide.. - Linus ]
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal: clock_cooling: Fix missing mutex_init()
thermal: hwmon: EXPORT_SYMBOL_GPL for thermal hwmon sysfs
thermal: fix race condition when updating cooling device
thermal/powerclamp: Prevent division by zero when counting interval
thermal: intel_pch_thermal: Add suspend/resume callback
Linus Torvalds [Mon, 15 Aug 2016 01:54:37 +0000 (18:54 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu fix from Greg Ungerer:
"This contains only a single fix for a register corruption problem on
certain types of m68k flat format binaries"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: fix user a5 register being overwritten
Linus Torvalds [Sun, 14 Aug 2016 02:39:38 +0000 (19:39 -0700)]
Merge tag 'fixes-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull h8300 and unicore32 architecture fixes from Guenter Roeck:
"Two patches to fix h8300 and unicore32 builds.
unicore32 builds have been broken since v4.6. The fix has been
available in -next since March of this year.
h8300 builds have been broken since the last commit window. The fix
has been available in -next since June of this year"
* tag 'fixes-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
h8300: Add missing include file to asm/io.h
unicore32: mm: Add missing parameter to arch_vma_access_permitted
Sabrina Dubroca [Fri, 12 Aug 2016 14:10:33 +0000 (16:10 +0200)]
net: remove type_check from dev_get_nest_level()
The idea for type_check in dev_get_nest_level() was to count the number
of nested devices of the same type (currently, only macvlan or vlan
devices).
This prevented the false positive lockdep warning on configurations such
as:
eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
However, this doesn't prevent a warning on a configuration such as:
In this case, all the locks end up with a nesting subclass of 1, so
lockdep thinks that there is still a deadlock:
- in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
take (vlan_netdev_xmit_lock_key, 1)
- in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
take (macvlan_netdev_addr_lock_key, 1)
By removing the linktype check in dev_get_nest_level() and always
incrementing the nesting depth, lockdep considers this configuration
valid.
Mike Manning [Fri, 12 Aug 2016 11:02:38 +0000 (12:02 +0100)]
net: ipv6: Do not keep IPv6 addresses when IPv6 is disabled
If IPv6 is disabled when the option is set to keep IPv6
addresses on link down, userspace is unaware of this as
there is no such indication via netlink. The solution is to
remove the IPv6 addresses in this case, which results in
netlink messages indicating removal of addresses in the
usual manner. This fix also makes the behavior consistent
with the case of having IPv6 disabled first, which stops
IPv6 addresses from being added.
Fixes: f1705ec197e7 ("net: ipv6: Make address flushing on ifdown optional") Signed-off-by: Mike Manning <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
We should initialize sctp_ht_iter::start_fail to zero if ->start()
succeeds, otherwise it's possible that we leave an old value of 1 there,
which will cause ->stop() to not call sctp_transport_walk_stop(), which
causes all sorts of problems like not calling rcu_read_unlock() (and
preempt_enable()), eventually leading to more warnings like this:
BUG: sleeping function called from invalid context at mm/slab.h:388
in_atomic(): 0, irqs_disabled(): 0, pid: 16551, name: trinity-c2
Preemption disabled at:[<ffffffff819bceb6>] rhashtable_walk_start+0x46/0x150
If iriap_register_lsap() fails to allocate memory, self->lsap is
set to NULL. However, none of the callers handle the failure and
irlmp_connect_request() will happily dereference it:
There's more problems with missing error checks in iriap_init() (and
indeed all of irda_init()), but that's a bigger problem that needs
very careful review and testing. This patch will fix the most serious
bug (as it's easily reached from unprivileged userspace).
Johannes Berg [Fri, 12 Aug 2016 05:48:21 +0000 (07:48 +0200)]
ipv6: suppress sparse warnings in IP6_ECN_set_ce()
Pass the correct type __wsum to csum_sub() and csum_add(). This doesn't
really change anything since __wsum really *is* __be32, but removes the
address space warnings from sparse.
Daniel Borkmann [Thu, 11 Aug 2016 19:38:37 +0000 (21:38 +0200)]
bpf: fix write helpers with regards to non-linear parts
Fix the bpf_try_make_writable() helper and all call sites we have in BPF,
it's currently defect with regards to skbs when the write_len spans into
non-linear parts, no matter if cloned or not.
There are multiple issues at once. First, using skb_store_bits() is not
correct since even if we have a cloned skb, page frags can still be shared.
To really make them private, we need to pull them in via __pskb_pull_tail()
first, which also gets us a private head via pskb_expand_head() implicitly.
This is for helpers like bpf_skb_store_bytes(), bpf_l3_csum_replace(),
bpf_l4_csum_replace(). Really, the only thing reasonable and working here
is to call skb_ensure_writable() before any write operation. Meaning, via
pskb_may_pull() it makes sure that parts we want to access are pulled in and
if not does so plus unclones the skb implicitly. If our write_len still fits
the headlen and we're cloned and our header of the clone is not writable,
then we need to make a private copy via pskb_expand_head(). skb_store_bits()
is a bit misleading and only safe to store into non-linear data in different
contexts such as 357b40a18b04 ("[IPV6]: IPV6_CHECKSUM socket option can
corrupt kernel memory").
For above BPF helper functions, it means after fixed bpf_try_make_writable(),
we've pulled in enough, so that we operate always based on skb->data. Thus,
the call to skb_header_pointer() and skb_store_bits() becomes superfluous.
In bpf_skb_store_bytes(), the len check is unnecessary too since it can
only pass in maximum of BPF stack size, so adding offset is guaranteed to
never overflow. Also bpf_l3/4_csum_replace() helpers must test for proper
offset alignment since they use __sum16 pointer for writing resulting csum.
The remaining helpers that change skb data not discussed here yet are
bpf_skb_vlan_push(), bpf_skb_vlan_pop() and bpf_skb_change_proto(). The
vlan helpers internally call either skb_ensure_writable() (pop case) and
skb_cow_head() (push case, for head expansion), respectively. Similarly,
bpf_skb_proto_xlat() takes care to not mangle page frags.
Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action") Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers") Fixes: 3697649ff29e ("bpf: try harder on clones when writing into skb") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Colin Ian King [Thu, 11 Aug 2016 17:17:22 +0000 (18:17 +0100)]
calipso: fix resource leak on calipso_genopt failure
Currently, if calipso_genopt fails then the error exit path
does not free the ipv6_opt_hdr new causing a memory leak. Fix
this by kfree'ing new on the error exit path.
Linus Torvalds [Sat, 13 Aug 2016 16:56:45 +0000 (09:56 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- an NVMe fix from Gabriel, fixing a suspend/resume issue on some
setups
- addition of a few missing entries in the block queue sysfs
documentation, from Joe
- a fix for a sparse shadow warning for the bvec iterator, from
Johannes
- a writeback deadlock involving raid issuing barriers, and not
flushing the plug when we wakeup the flusher threads. From
Konstantin
- a set of patches for the NVMe target/loop/rdma code, from Roland and
Sagi
* 'for-linus' of git://git.kernel.dk/linux-block:
bvec: avoid variable shadowing warning
doc: update block/queue-sysfs.txt entries
nvme: Suspend all queues before deletion
mm, writeback: flush plugged IO in wakeup_flusher_threads()
nvme-rdma: Remove unused includes
nvme-rdma: start async event handler after reconnecting to a controller
nvmet: Fix controller serial number inconsistency
nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads
nvmet-rdma: Correctly handle RDMA device hot removal
nvme-rdma: Make sure to shutdown the controller if we can
nvme-loop: Remove duplicate call to nvme_remove_namespaces
nvme-rdma: Free the I/O tags when we delete the controller
nvme-rdma: Remove duplicate call to nvme_remove_namespaces
nvme-rdma: Fix device removal handling
nvme-rdma: Queue ns scanning after a sucessful reconnection
nvme-rdma: Don't leak uninitialized memory in connect request private data
Guenter Roeck [Thu, 9 Jun 2016 03:11:58 +0000 (20:11 -0700)]
h8300: Add missing include file to asm/io.h
h8300 builds fail with
arch/h8300/include/asm/io.h:9:15: error: unknown type name ‘u8’
arch/h8300/include/asm/io.h:15:15: error: unknown type name ‘u16’
arch/h8300/include/asm/io.h:21:15: error: unknown type name ‘u32’
Guenter Roeck [Mon, 21 Mar 2016 11:20:53 +0000 (04:20 -0700)]
unicore32: mm: Add missing parameter to arch_vma_access_permitted
unicore32 fails to compile with the following errors.
mm/memory.c: In function ‘__handle_mm_fault’:
mm/memory.c:3381: error:
too many arguments to function ‘arch_vma_access_permitted’
mm/gup.c: In function ‘check_vma_flags’:
mm/gup.c:456: error:
too many arguments to function ‘arch_vma_access_permitted’
mm/gup.c: In function ‘vma_permits_fault’:
mm/gup.c:640: error:
too many arguments to function ‘arch_vma_access_permitted’
Daniel Borkmann [Fri, 12 Aug 2016 20:17:17 +0000 (22:17 +0200)]
bpf: fix bpf_skb_in_cgroup helper naming
While hashing out BPF's current_task_under_cgroup helper bits, it came
to discussion that the skb_in_cgroup helper name was suboptimally chosen.
Tejun says:
So, I think in_cgroup should mean that the object is in that
particular cgroup while under_cgroup in the subhierarchy of that
cgroup. Let's rename the other subhierarchy test to under too. I
think that'd be a lot less confusing going forward.
[...]
It's more intuitive and gives us the room to implement the real
"in" test if ever necessary in the future.
Since this touches uapi bits, we need to change this as long as v4.8
is not yet officially released. Thus, change the helper enum and rename
related bits.
Arnd Bergmann [Wed, 10 Aug 2016 21:54:08 +0000 (23:54 +0200)]
dsa: mv88e6xxx: hide unused functions
When CONFIG_NET_DSA_HWMON is disabled, we get warnings about two unused
functions whose only callers are all inside of an #ifdef:
drivers/net/dsa/mv88e6xxx.c:3257:12: 'mv88e6xxx_mdio_page_write' defined but not used [-Werror=unused-function]
drivers/net/dsa/mv88e6xxx.c:3244:12: 'mv88e6xxx_mdio_page_read' defined but not used [-Werror=unused-function]
This adds another ifdef around the function definitions. The warnings
appeared after the functions were marked 'static', but the problem
was already there before that.
Signed-off-by: Arnd Bergmann <[email protected]> Fixes: 57d3231057e9 ("net: dsa: mv88e6xxx: fix style issues") Reviewed-by: Vivien Didelot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Linus Torvalds [Fri, 12 Aug 2016 23:28:41 +0000 (16:28 -0700)]
Merge tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"Fixes for the dentry refcounting leak I introduced in 4.8-rc1, and for
races in the LOCK code which appear to go back to the big nfsd state
lock removal from 3.17"
* tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux:
nfsd: don't return an unhashed lock stateid after taking mutex
nfsd: Fix race between FREE_STATEID and LOCK
nfsd: fix dentry refcounting on create
Linus Torvalds [Fri, 12 Aug 2016 23:23:58 +0000 (16:23 -0700)]
Merge tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Two hibernation fixes allowing it to work with the recently added
randomization of the kernel identity mapping base on x86-64 and one
cpufreq driver regression fix.
Specifics:
- Fix the x86 identity mapping creation helpers to avoid the
assumption that the base address of the mapping will always be
aligned at the PGD level, as it may be aligned at the PUD level if
address space randomization is enabled (Rafael Wysocki).
- Fix the hibernation core to avoid executing tracing functions
before restoring the processor state completely during resume
(Thomas Garnier).
- Fix a recently introduced regression in the powernv cpufreq driver
that causes it to crash due to an out-of-bounds array access
(Akshay Adiga)"
* tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / hibernate: Restore processor state before using per-CPU variables
x86/power/64: Always create temporary identity mapping correctly
cpufreq: powernv: Fix crash in gpstate_timer_handler()
Linus Torvalds [Fri, 12 Aug 2016 21:31:10 +0000 (14:31 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"This is bigger than usual - the reason is partly a pent-up stream of
fixes after the merge window and partly accidental. The fixes are:
- five patches to fix a boot failure on Andy Lutomirsky's laptop
- four SGI UV platform fixes
- KASAN fix
- warning fix
- documentation update
- swap entry definition fix
- pkeys fix
- irq stats fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
x86/efi: Allocate a trampoline if needed in efi_free_boot_services()
x86/boot: Rework reserve_real_mode() to allow multiple tries
x86/boot: Defer setup_real_mode() to early_initcall time
x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly
x86/boot: Run reserve_bios_regions() after we initialize the memory map
x86/irq: Do not substract irq_tlb_count from irq_call_count
x86/mm: Fix swap entry comment and macro
x86/mm/kaslr: Fix -Wformat-security warning
x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation
x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables
x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems
x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values
x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash
x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous
x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET
x86/mm: Disable preemption during CR3 read+write
x86/mm/KASLR: Increase BRK pages for KASLR memory randomization
x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization
x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
Linus Torvalds [Fri, 12 Aug 2016 20:55:06 +0000 (13:55 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Misc fixes: a /dev/rtc regression fix, two APIC timer period
calibration fixes, an ARM clocksource driver fix and a NOHZ
power use regression fix"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup
x86/timers/apic: Inform TSC deadline clockevent device about recalibration
x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error
timers: Fix get_next_timer_interrupt() computation
clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered
Thomas Garnier [Thu, 11 Aug 2016 21:49:29 +0000 (14:49 -0700)]
PM / hibernate: Restore processor state before using per-CPU variables
Restore the processor state before calling any other functions to
ensure per-CPU variables can be used with KASLR memory randomization.
Tracing functions use per-CPU variables (GS based on x86) and one was
called just before restoring the processor state fully. It resulted
in a double fault when both the tracing & the exception handler
functions tried to use a per-CPU variable.
Andy Yan [Fri, 12 Aug 2016 10:01:50 +0000 (18:01 +0800)]
power: reset: reboot-mode: fix build error of missing ioremap/iounmap on UM
commit 4fcd504edbf7 ("power: reset: add reboot mode driver") uses api from
syscon, and syscon uses ioremap/iounmap which depends on HAS_IOMEM, so
let's depend on MFD_SYSCON instead of selecting it directly to avoid the
um-allyesconfig like build error on archs that without iomem:
drivers/mfd/syscon.c: In function 'of_syscon_register':
drivers/mfd/syscon.c:67:9: error: implicit declaration of function 'ioremap' [-Werror=implicit-function-declaration]
base = ioremap(res.start, resource_size(&res));
^
drivers/mfd/syscon.c:67:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
base = ioremap(res.start, resource_size(&res));
^
drivers/mfd/syscon.c:109:2: error: implicit declaration of function 'iounmap' [-Werror=implicit-function-declaration]
iounmap(base);
^
power: supply: max17042_battery: fix model download bug.
The device's model download function returns the model data as
an array of u32s, which is later compared to the reference
model data. However, since the latter is an array of u16s,
the comparison does not happen correctly, and model verification
fails. This in turn breaks the POR initialization sequence.
Fixes: 39e7213edc4f3 ("max17042_battery: Support regmap to access device's registers") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Sven Van Asbroeck <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Sebastian Reichel <[email protected]>
Linus Torvalds [Fri, 12 Aug 2016 20:21:18 +0000 (13:21 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Mostly tooling fixes, plus two uncore-PMU fixes, an uprobes fix, a
perf-cgroups fix and an AUX events fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Add enable_box for client MSR uncore
perf/x86/intel/uncore: Fix uncore num_counters
uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
perf/core: Set cgroup in CPU contexts for new cgroup events
perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash
perf probe ppc64le: Fix probe location when using DWARF
perf probe: Add function to post process kernel trace events
tools: Sync cpufeatures headers with the kernel
toops: Sync tools/include/uapi/linux/bpf.h with the kernel
tools: Sync cpufeatures.h and vmx.h with the kernel
perf probe: Support signedness casting
perf stat: Avoid skew when reading events
perf probe: Fix module name matching
perf probe: Adjust map->reloc offset when finding kernel symbol from map
perf hists: Trim libtraceevent trace_seq buffers
perf script: Add 'bpf-output' field to usage message
Jeff Layton [Thu, 11 Aug 2016 14:37:39 +0000 (10:37 -0400)]
nfsd: don't return an unhashed lock stateid after taking mutex
nfsd4_lock will take the st_mutex before working with the stateid it
gets, but between the time when we drop the cl_lock and take the mutex,
the stateid could become unhashed (a'la FREE_STATEID). If that happens
the lock stateid returned to the client will be forgotten.
Fix this by first moving the st_mutex acquisition into
lookup_or_create_lock_state. Then, have it check to see if the lock
stateid is still hashed after taking the mutex. If it's not, then put
the stateid and try the find/create again.
Linus Torvalds [Fri, 12 Aug 2016 19:46:37 +0000 (12:46 -0700)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Misc fixes: lockstat fix, futex fix on !MMU systems, big endian fix
for qrwlocks and a race fix for pvqspinlocks"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/pvqspinlock: Fix a bug in qstat_read()
locking/pvqspinlock: Fix double hash race
locking/qrwlock: Fix write unlock bug on big endian systems
futex: Assume all mappings are private on !MMU systems
Linus Torvalds [Fri, 12 Aug 2016 19:39:02 +0000 (12:39 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"A fix for EFI capsules and an SGI UV platform fix"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/capsule: Allocate whole capsule into virtual memory
x86/platform/uv: Skip UV runtime services mapping in the efi_runtime_disabled case
Linus Torvalds [Fri, 12 Aug 2016 19:32:24 +0000 (12:32 -0700)]
Merge tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
- Stable patch from Olga to fix RPCSEC_GSS upcalls when the same user
needs multiple different security services (e.g. krb5i and krb5p).
- Stable patch to fix a regression introduced by the use of
SO_REUSEPORT, and that prevented the use of multiple different NFS
versions to the same server.
- TCP socket reconnection timer fixes.
- Patch from Neil to disable the use of IPv6 temporary addresses"
* tag 'nfs-for-4.8-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Cap the transport reconnection timer at 1/2 lease period
NFSv4: Cleanup the setting of the nfs4 lease period
SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout
SUNRPC: Fix reconnection timeouts
NFSv4.2: LAYOUTSTATS may return NFS4ERR_ADMIN/DELEG_REVOKED
SUNRPC: disable the use of IPv6 temporary addresses.
SUNRPC: allow for upcalls for same uid but different gss service
SUNRPC: Fix up socket autodisconnect
SUNRPC: Handle EADDRNOTAVAIL on connection failures
Linus Torvalds [Fri, 12 Aug 2016 19:28:23 +0000 (12:28 -0700)]
Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fixes from Dan Williams:
- Fix for the nd_blk (NVDIMM Block Window Aperture) driver.
A spec clarification requires the driver to mask off reserved bits in
status register. This is tagged for -stable back to the v4.2 kernel.
- Fix for a kernel crash in the nvdimm unit tests when module loading
is interrupted with SIGTERM. Tagged for -stable since validation
efforts external to Intel use the unit tests for qualifying
backports.
- Add a new 'size' sysfs attribute for the BTT (NVDIMM Block
Translation Table) driver to make it symmetric with the other
namespace personality drivers (PFN and DAX) that provide a size
attribute for indicating how much namespace capacity is lost to
metadata.
The BTT change arrived at the start of the merge window and has
appeared in a -next release. It can technically wait for 4.9, but it
is small, fixes asymmetry in the libnvdimm-sysfs interface, and
something I would have squeezed into the v4.8 pull request had it
arrived a few days earlier.
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
tools/testing/nvdimm: fix SIGTERM vs hotplug crash
nvdimm, btt: add a size attribute for BTTs
libnvdimm, nd_blk: mask off reserved status bits
Linus Torvalds [Fri, 12 Aug 2016 19:26:59 +0000 (12:26 -0700)]
Merge tag 'sound-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A regression fix of HD-audio runtime PM and two USB quirks"
* tag 'sound-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Manage power well properly for resume
ALSA: usb-audio: Add quirk for ELP HD USB Camera
ALSA: usb-audio: Add a sample rate quirk for Creative Live! Cam Socialize HD (VF0610)
Linus Torvalds [Fri, 12 Aug 2016 19:09:44 +0000 (12:09 -0700)]
Merge tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some powerpc fixes for 4.8:
Misc:
- powerpc/vdso: Fix build rules to rebuild vdsos correctly from Nicholas Piggin
- powerpc/ptrace: Fix coredump since ptrace TM changes from Cyril Bur
- powerpc/32: Fix csum_partial_copy_generic() from Christophe Leroy
- cxl: Set psl_fir_cntl to production environment value from Frederic Barrat
- powerpc/eeh: Switch to conventional PCI address output in EEH log from Guilherme G. Piccoli
- cxl: Use fixed width predefined types in data structure. from Philippe Bergheaud
- powerpc/vdso: Add missing include file from Guenter Roeck
- powerpc: Fix unused function warning 'lmb_to_memblock' from Alastair D'Silva
- powerpc/powernv/ioda: Fix TCE invalidate to work in real mode again from Alexey Kardashevskiy
- powerpc/cell: Add missing error code in spufs_mkgang() from Dan Carpenter
- crypto: crc32c-vpmsum - Convert to CPU feature based module autoloading from Anton Blanchard
- powerpc/pasemi: Fix coherent_dma_mask for dma engine from Darren Stevens
Benjamin Herrenschmidt:
- powerpc/32: Fix crash during static key init
- powerpc: Update obsolete comment in setup_32.c about early_init()
- powerpc: Print the kernel load address at the end of prom_init()
- powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
- powerpc/xics: Properly set Edge/Level type and enable resend
Mahesh Salgaonkar:
- powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
- powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
- powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
- powerpc/powernv: Load correct TOC pointer while waking up from winkle.
Andrew Donnellan:
- cxl: Fix sparse warnings
- cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
Michael Ellerman:
- selftests/powerpc: Specify we expect to build with std=gnu99
- powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
- powerpc/pci: Fix endian bug in fixed PHB numbering"
* tag 'powerpc-4.8-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (26 commits)
selftests/powerpc: Specify we expect to build with std=gnu99
powerpc/vdso: Fix build rules to rebuild vdsos correctly
powerpc/Makefile: Use cflags-y/aflags-y for setting endian options
powerpc/32: Fix crash during static key init
powerpc: Update obsolete comment in setup_32.c about early_init()
powerpc: Print the kernel load address at the end of prom_init()
powerpc/ptrace: Fix coredump since ptrace TM changes
powerpc/32: Fix csum_partial_copy_generic()
cxl: Set psl_fir_cntl to production environment value
powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARs
powerpc/book3s: Fix MCE console messages for unrecoverable MCE.
powerpc/pci: Fix endian bug in fixed PHB numbering
powerpc/eeh: Switch to conventional PCI address output in EEH log
cxl: Fix sparse warnings
cxl: Fix NULL dereference in cxl_context_init() on PowerVM guests
cxl: Use fixed width predefined types in data structure.
powerpc/vdso: Add missing include file
powerpc: Fix unused function warning 'lmb_to_memblock'
powerpc/powernv: Fix MCE handler to avoid trashing CR0/CR1 registers.
powerpc/powernv: Move IDLE_STATE_ENTER_SEQ macro to cpuidle.h
...
Riku Voipio [Wed, 15 Jun 2016 08:27:59 +0000 (11:27 +0300)]
arm64: defconfig: add options for virtualization and containers
Enable options commonly needed by popular virtualization
and container applications. Use modules when possible to
avoid too much overhead for users not interested.
- add namespace and cgroup options needed
- add seccomp - optional, but enhances Qemu etc
- bridge, nat, veth, macvtap and multicast for routing
guests and containers
- btfrs and overlayfs modules for container COW backends
- while near it, make fuse a module instead of built-in.
Generated with make saveconfig and dropping unrelated spurious
change hunks while commiting. bloat-o-meter old-vmlinux vmlinux:
Mark Rutland [Thu, 11 Aug 2016 13:11:06 +0000 (14:11 +0100)]
arm64: hibernate: handle allocation failures
In create_safe_exec_page(), we create a copy of the hibernate exit text,
along with some page tables to map this via TTBR0. We then install the
new tables in TTBR0.
In swsusp_arch_resume() we call create_safe_exec_page() before trying a
number of operations which may fail (e.g. copying the linear map page
tables). If these fail, we bail out of swsusp_arch_resume() and return
an error code, but leave TTBR0 as-is. Subsequently, the core hibernate
code will call free_basic_memory_bitmaps(), which will free all of the
memory allocations we made, including the page tables installed in
TTBR0.
Thus, we may have TTBR0 pointing at dangling freed memory for some
period of time. If the hibernate attempt was triggered by a user
requesting a hibernate test via the reboot syscall, we may return to
userspace with the clobbered TTBR0 value.
Avoid these issues by reorganising swsusp_arch_resume() such that we
have no failure paths after create_safe_exec_page(). We also add a check
that the zero page allocation succeeded, matching what we have for other
allocations.
Mark Rutland [Thu, 11 Aug 2016 13:11:05 +0000 (14:11 +0100)]
arm64: hibernate: avoid potential TLB conflict
In create_safe_exec_page we install a set of global mappings in TTBR0,
then subsequently invalidate TLBs. While TTBR0 points at the zero page,
and the TLBs should be free of stale global entries, we may have stale
ASID-tagged entries (e.g. from the EFI runtime services mappings) for
the same VAs. Per the ARM ARM these ASID-tagged entries may conflict
with newly-allocated global entries, and we must follow a
Break-Before-Make approach to avoid issues resulting from this.
This patch reworks create_safe_exec_page to invalidate TLBs while the
zero page is still in place, ensuring that there are no potential
conflicts when the new TTBR0 value is installed. As a single CPU is
online while this code executes, we do not need to perform broadcast TLB
maintenance, and can call local_flush_tlb_all(), which also subsumes
some barriers. The remaining assembly is converted to use write_sysreg()
and isb().
Other than this, we safely manipulate TTBRs in the hibernate dance. The
code we install as part of the new TTBR0 mapping (the hibernated
kernel's swsusp_arch_suspend_exit) installs a zero page into TTBR1,
invalidates TLBs, then installs its preferred value. Upon being restored
to the middle of swsusp_arch_suspend, the new image will call
__cpu_suspend_exit, which will call cpu_uninstall_idmap, installing the
zero page in TTBR0 and invalidating all TLB entries.
Executing from a non-executable area gives an ugly message:
lkdtm: Performing direct entry EXEC_RODATA
lkdtm: attempting ok execution at ffff0000084c0e08
lkdtm: attempting bad execution at ffff000008880700
Bad mode in Synchronous Abort handler detected on CPU2, code 0x8400000e -- IABT (current EL)
CPU: 2 PID: 998 Comm: sh Not tainted 4.7.0-rc2+ #13
Hardware name: linux,dummy-virt (DT)
task: ffff800077e35780 ti: ffff800077970000 task.ti: ffff800077970000
PC is at lkdtm_rodata_do_nothing+0x0/0x8
LR is at execute_location+0x74/0x88
The 'IABT (current EL)' indicates the error but it's a bit cryptic
without knowledge of the ARM ARM. There is also no indication of the
specific address which triggered the fault. The increase in kernel
page permissions makes hitting this case more likely as well.
Handling the case in the vectors gives a much more familiar looking
error message:
Propagate errors from kvm_mips_handle_kseg0_tlb_fault() and
kvm_mips_handle_mapped_seg_tlb_fault(), usually triggering an internal
error since they normally indicate the guest accessed bad physical
memory or the commpage in an unexpected way.
James Hogan [Thu, 11 Aug 2016 10:58:13 +0000 (11:58 +0100)]
MIPS: KVM: Add missing gfn range check
kvm_mips_handle_mapped_seg_tlb_fault() calculates the guest frame number
based on the guest TLB EntryLo values, however it is not range checked
to ensure it lies within the guest_pmap. If the physical memory the
guest refers to is out of range then dump the guest TLB and emit an
internal error.
kvm_mips_handle_mapped_seg_tlb_fault() appears to map the guest page at
virtual address 0 to PFN 0 if the guest has created its own mapping
there. The intention is unclear, but it may have been an attempt to
protect the zero page from being mapped to anything but the comm page in
code paths you wouldn't expect from genuine commpage accesses (guest
kernel mode cache instructions on that address, hitting trapping
instructions when executing from that address with a coincidental TLB
eviction during the KVM handling, and guest user mode accesses to that
address).
Fix this to check for mappings exactly at KVM_GUEST_COMMPAGE_ADDR (it
may not be at address 0 since commit 42aa12e74e91 ("MIPS: KVM: Move
commpage so 0x0 is unmapped")), and set the corresponding EntryLo to be
interpreted as 0 (invalid).
KVM: Protect device ops->create and list_add with kvm->lock
KVM devices were manipulating list data structures without any form of
synchronization, and some implementations of the create operations also
suffered from a lack of synchronization.
Now when we've split the xics create operation into create and init, we
can hold the kvm->lock mutex while calling the create operation and when
manipulating the devices list.
The error path in the generic code gets slightly ugly because we have to
take the mutex again and delete the device from the list, but holding
the mutex during anon_inode_getfd or releasing/locking the mutex in the
common non-error path seemed wrong.
As we are about to hold the kvm->lock during the create operation on KVM
devices, we should move the call to xics_debugfs_init into its own
function, since holding a mutex over extended amounts of time might not
be a good idea.
Introduce an init operation on the kvm_device_ops struct which cannot
fail and call this, if configured, after the device has been created.
KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed
When triggering KVM_RUN without a user memory region being mapped
(KVM_SET_USER_MEMORY_REGION) a validity intercept occurs. This could
happen, if the user memory region was not mapped initially or if it
was unmapped after the vcpu is initialized. The function
kvm_s390_handle_requests checks for the KVM_REQ_MMU_RELOAD bit. The
check function always clears this bit. If gmap_mprotect_notify
returns an error code, the mapping failed, but the KVM_REQ_MMU_RELOAD
was not set anymore. So the next time kvm_s390_handle_requests is
called, the execution would fall trough the check for
KVM_REQ_MMU_RELOAD. The bit needs to be resetted, if
gmap_mprotect_notify returns an error code. Resetting the bit with
kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu) fixes the bug.
When KVM_RUN is triggered on a VCPU without an initial reset, a
validity intercept occurs.
Setting the prefix will set the KVM_REQ_MMU_RELOAD bit initially,
thus preventing the bug.
Kan Liang [Thu, 11 Aug 2016 14:31:14 +0000 (07:31 -0700)]
perf/x86/intel/uncore: Add enable_box for client MSR uncore
There are bug reports about miscounting uncore counters on some
client machines like Sandybridge, Broadwell and Skylake. It is
very likely to be observed on idle systems.
This issue is caused by a hardware issue. PERF_GLOBAL_CTL could be
cleared after Package C7, and nothing will be count.
The related errata (HSD 158) could be found in:
This patch tries to work around this issue by re-enabling PERF_GLOBAL_CTL
in ->enable_box(). The workaround does not cover all cases. It helps for new
events after returning from C7. But it cannot prevent C7, it will still
miscount if a counter is already active.
There is no drawback in leaving it enabled, so it does not need
disable_box() here.
Kan Liang [Thu, 11 Aug 2016 14:30:20 +0000 (07:30 -0700)]
perf/x86/intel/uncore: Fix uncore num_counters
Some uncore boxes' num_counters value for Haswell server and
Broadwell server are not correct (too large, off by one).
This issue was found by comparing the code with the document. Although
there is no bug report from users yet, accessing non-existent counters
is dangerous and the behavior is undefined: it may cause miscounting or
even crashes.
This patch makes them consistent with the uncore document.
Denys Vlasenko [Thu, 11 Aug 2016 15:45:21 +0000 (17:45 +0200)]
uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
Since instruction decoder now supports EVEX-encoded instructions, two fixes
are needed to correctly handle them in uprobes.
Extended bits for MODRM.rm field need to be sanitized just like we do it
for VEX3, to avoid encoding wrong register for register-relative access.
EVEX has _two_ extended bits: b and x. Theoretically, EVEX.x should be
ignored by the CPU (since GPRs go only up to 15, not 31), but let's be
paranoid here: proper encoding for register-relative access
should have EVEX.x = 1.
Secondly, we should fetch vex.vvvv for EVEX too.
This is now super easy because instruction decoder populates
vex_prefix.bytes[2] for all flavors of (e)vex encodings, even for VEX2.
Linus Torvalds [Thu, 11 Aug 2016 23:58:24 +0000 (16:58 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
"7 fixes"
* emailed patches from Andrew Morton <[email protected]>:
mm/memory_hotplug.c: initialize per_cpu_nodestats for hotadded pgdats
mm, oom: fix uninitialized ret in task_will_free_mem()
kasan: remove the unnecessary WARN_ONCE from quarantine.c
mm: memcontrol: fix memcg id ref counter on swap charge move
mm: memcontrol: fix swap counter leak on swapout from offline cgroup
proc, meminfo: use correct helpers for calculating LRU sizes in meminfo
mm/hugetlb: fix incorrect hugepages count during mem hotplug
kasan: remove the unnecessary WARN_ONCE from quarantine.c
It's quite unlikely that the user will so little memory that the per-CPU
quarantines won't fit into the given fraction of the available memory.
Even in that case he won't be able to do anything with the information
given in the warning.
Vladimir Davydov [Thu, 11 Aug 2016 22:33:03 +0000 (15:33 -0700)]
mm: memcontrol: fix memcg id ref counter on swap charge move
Since commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
after many small jobs") swap entries do not pin memcg->css.refcnt
directly. Instead, they pin memcg->id.ref. So we should adjust the
reference counters accordingly when moving swap charges between cgroups.
Vladimir Davydov [Thu, 11 Aug 2016 22:33:00 +0000 (15:33 -0700)]
mm: memcontrol: fix swap counter leak on swapout from offline cgroup
An offline memory cgroup might have anonymous memory or shmem left
charged to it and no swap. Since only swap entries pin the id of an
offline cgroup, such a cgroup will have no id and so an attempt to
swapout its anon/shmem will not store memory cgroup info in the swap
cgroup map. As a result, memcg->swap or memcg->memsw will never get
uncharged from it and any of its ascendants.
Fix this by always charging swapout to the first ancestor cgroup that
hasn't released its id yet.
Mel Gorman [Thu, 11 Aug 2016 22:32:57 +0000 (15:32 -0700)]
proc, meminfo: use correct helpers for calculating LRU sizes in meminfo
meminfo_proc_show() and si_mem_available() are using the wrong helpers
for calculating the size of the LRUs. The user-visible impact is that
there appears to be an abnormally high number of unevictable pages.
: dissolve_free_huge_page intends to break a hugepage into buddy, and the
: destination hugepage is supposed to be allocated from the pool of the
: destination node, so the system-wide pool size is reduced. So adding
: h->max_huge_pages-- makes sense to me.
Linus Torvalds [Thu, 11 Aug 2016 21:14:23 +0000 (14:14 -0700)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A couple of bug fixes have come in for v4.8 so far. Since the first
few were originally meant to go into -rc1 (but didn't get sent in time
for travel reasons), the branch is unfortunately based on top of a
commit in the middle of the merge window rather than -rc1.
Content-wise we have:
- a fix for the last remaining broken build in kernelci, getting
mach-shmobile to build again with SMP disabled
- a fix for a realview regression that broke real hardware but not
the qemu model that everyone uses in practice (needed for v4.7 as
well)
- a merge conflict fix for Tegra that also broke v4.7
- two Kconfig fixes for arm64 build regressions
- a couple of arm32 build warning fixes (all harmless)
- fix the RTC on Exynos7 Espresso (which apparently never worked
right)"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
Merge tag 'pxa-fixes-v4.8' of https://github.com/rjarzmik/linux into randconfig-4.8
arm64: Kconfig: select HISILICON_IRQ_MBIGEN only if PCI is selected
arm64: Kconfig: select ALPINE_MSI only if PCI is selected
ARM: dts: realview: Fix PBX-A9 cache description
ARM: tegra: fix erroneous address in dts
ARM: dts: add syscon compatible string for AP syscon
ARM: dts: add syscon compatible string for CP syscon
ARM: oxnas: select reset controller framework
ARM: hide mach-*/ include for ARM_SINGLE_ARMV7M
ARM: don't include removed directories
Revert "ARM: aspeed: adapt defconfigs for new CONFIG_PRINTK_TIME"
ARM: shmobile: don't call platform_can_secondary_boot on UP
MAINTAINER: alpine: add a mailing list
ARM: do away with final ARCH_REQUIRE_GPIOLIB
arm64: dts: Fix RTC by providing rtc_src clock
Linus Torvalds [Thu, 11 Aug 2016 21:10:23 +0000 (14:10 -0700)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost fixes and cleanups from Michael Tsirkin:
"Misc fixes and cleanups all over the place"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio/s390: deprecate old transport
virtio/s390: keep early_put_chars
virtio_blk: Fix a slient kernel panic
virtio-vsock: fix include guard typo
vhost/vsock: fix vhost virtio_vsock_pkt use-after-free
9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc()
virtio: fix error handling for debug builds
virtio: fix memory leak in virtqueue_add()
Linus Torvalds [Thu, 11 Aug 2016 20:53:34 +0000 (13:53 -0700)]
Merge tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"A patch for a NULL dereference bug introduced in 4.8-rc1 and a handful
of static checker fixes"
* tag 'ceph-for-4.8-rc2' of https://github.com/ceph/ceph-client:
ceph: initialize pathbase in the !dentry case in encode_caps_cb()
rbd: nuke the 32-bit pool id check
rbd: destroy header_oloc in rbd_dev_release()
ceph: fix null pointer dereference in ceph_flush_snaps()
libceph: using kfree_rcu() to simplify the code
libceph: make cancel_generic_request() static
libceph: fix return value check in alloc_msg_with_page_vector()
Chuck Lever [Thu, 11 Aug 2016 14:37:30 +0000 (10:37 -0400)]
nfsd: Fix race between FREE_STATEID and LOCK
When running LTP's nfslock01 test, the Linux client can send a LOCK
and a FREE_STATEID request at the same time. The outcome is:
Frame 324 R OPEN stateid [2,O]
Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64
Frame 115008 R LOCK stateid [1,L]
Frame 115012 C WRITE stateid [0,L] offset 672000 len 64
Frame 115016 R WRITE NFS4_OK
Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64
Frame 115022 R LOCKU NFS4_OK
Frame 115025 C FREE_STATEID stateid [2,L]
Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid [3,L]
Frame 115034 C WRITE stateid [0,L] offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID
In other words, the server returns stateid L in a successful LOCK
reply, but it has already released it. Subsequent uses of stateid L
fail.
To address this, protect the generation check in nfsd4_free_stateid
with the st_mutex. This should guarantee that only one of two
outcomes occurs: either LOCK returns a fresh valid stateid, or
FREE_STATEID returns NFS4ERR_LOCKS_HELD.
Sabrina Dubroca [Thu, 11 Aug 2016 13:24:27 +0000 (15:24 +0200)]
macsec: use after free when deleting the underlying device
macsec_notify() loops over the list of macsec devices configured on the
underlying device when this device is being removed. This list is part
of the rx_handler data.
However, macsec_dellink unregisters the rx_handler and frees the
rx_handler data when the last macsec device is removed from the
underlying device.
Add macsec_common_dellink() to delete macsec devices without
unregistering the rx_handler and freeing the associated data.
Fixes: 960d5848dbf1 ("macsec: fix memory leaks around rx_handler (un)registration") Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Jason Wang [Thu, 11 Aug 2016 10:15:56 +0000 (18:15 +0800)]
macvtap: fix use after free for skb_array during release
We've clean skb_array in macvtap_put_queue() but still try to pop from
it during macvtap_sock_destruct(). Fix this use after free by moving
the skb array cleanup to macvtap_sock_destruct() instead.
David A. Long [Wed, 10 Aug 2016 20:44:51 +0000 (16:44 -0400)]
arm64: Remove stack duplicating code from jprobes
Because the arm64 calling standard allows stacked function arguments to be
anywhere in the stack frame, do not attempt to duplicate the stack frame for
jprobes handler functions.
Documentation changes to describe this issue have been broken out into a
separate patch in order to simultaneously address them in other
architecture(s).
Josef Bacik [Wed, 10 Aug 2016 18:46:27 +0000 (14:46 -0400)]
nfsd: fix dentry refcounting on create
b44061d0b9 introduced a dentry ref counting bug. Previously we were
grabbing one ref to dchild in nfsd_create(), but with the creation of
nfsd_create_locked() we have a ref for dchild from the lookup in
nfsd_create(), and then another ref in nfsd_create_locked(). The ref
from the lookup in nfsd_create() is never dropped and results in
dentries still in use at unmount.
When nvme_delete_queue fails in the first pass of the
nvme_disable_io_queues() loop, we return early, failing to suspend all
of the IO queues. Later, on the nvme_pci_disable path, this causes us
to disable MSI without actually having freed all the IRQs, which
triggers the BUG_ON in free_msi_irqs(), as show below.
This patch refactors nvme_disable_io_queues to suspend all queues before
start submitting delete queue commands. This way, we ensure that we
have at least returned every IRQ before continuing with the removal
path.
x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe()
I made a mistake while converting the driver to the hotplug state
machine and as a result x2apic_cluster_probe() was accessing
cpus_in_cluster before allocating it.
This patch fixes it by setting the cpumask after the allocation the
memory succeeded.
While at it, I marked two functions static which are only used within
this file.