Ilya Dryomov [Fri, 21 Nov 2014 19:16:43 +0000 (22:16 +0300)]
rbd: don't treat CEPH_OSD_OP_DELETE as extent op
CEPH_OSD_OP_DELETE is not an extent op, stop treating it as such. This
sneaked in with discard patches - it's one of the three osd ops (the
other two are CEPH_OSD_OP_TRUNCATE and CEPH_OSD_OP_ZERO) that discard
is implemented with.
Yan, Zheng [Thu, 6 Nov 2014 07:09:41 +0000 (15:09 +0800)]
ceph: introduce global empty snap context
Current snaphost code does not properly handle moving inode from one
empty snap realm to another empty snap realm. After changing inode's
snap realm, some dirty pages' snap context can be not equal to inode's
i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs()
The fix is introduce a global empty snap context for all empty snap
realm. This avoids triggering the BUG() for filesystem with no snapshot.
ceph, rbd: delete unnecessary checks before two function calls
The functions ceph_put_snap_context() and iput() test whether their
argument is NULL and then return immediately. Thus the test around the
call is not needed.
This issue was detected by using the Coccinelle software.
Yan, Zheng [Wed, 22 Oct 2014 01:09:56 +0000 (18:09 -0700)]
ceph: introduce a new inode flag indicating if cached dentries are ordered
After creating/deleting/renaming file, offsets of sibling dentries may
change. So we can not use cached dentries to satisfy readdir. But we can
still use the cached dentries to conclude -ENOENT for lookup.
This patch introduces a new inode flag indicating if child dentries are
ordered. The flag is set at the same time marking a directory complete.
After creating/deleting/renaming file, we clear the flag on directory
inode. This prevents ceph_readdir() from using cached dentries to satisfy
readdir syscall.
Ilya Dryomov [Thu, 23 Oct 2014 12:32:57 +0000 (16:32 +0400)]
libceph: nuke ceph_kvfree()
Use kvfree() from linux/mm.h instead, which is identical. Also fix the
ceph_buffer comment: we will allocate with kmalloc() up to 32k - the
value of PAGE_ALLOC_COSTLY_ORDER, but that really is just an
implementation detail so don't mention it at all.
Yan, Zheng [Tue, 14 Oct 2014 02:33:35 +0000 (10:33 +0800)]
ceph: fix file lock interruption
When a lock operation is interrupted, current code sends a unlock request to
MDS to undo the lock operation. This method does not work as expected because
the unlock request can drop locks that have already been acquired.
The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR
requests to interrupt blocked file lock request. These requests do not drop
locks that have alread been acquired, they only interrupt blocked file lock
request.
Johannes Berg [Wed, 17 Dec 2014 12:55:49 +0000 (13:55 +0100)]
mac80211: free management frame keys when removing station
When writing the code to allow per-station GTKs, I neglected to
take into account the management frame keys (index 4 and 5) when
freeing the station and only added code to free the first four
data frame keys.
Fix this by iterating the array of keys over the right length.
Currently the H_CONFER hcall is implemented in kernel virtual mode,
meaning that whenever a guest thread does an H_CONFER, all the threads
in that virtual core have to exit the guest. This is bad for
performance because it interrupts the other threads even if they
are doing useful work.
The H_CONFER hcall is called by a guest VCPU when it is spinning on a
spinlock and it detects that the spinlock is held by a guest VCPU that
is currently not running on a physical CPU. The idea is to give this
VCPU's time slice to the holder VCPU so that it can make progress
towards releasing the lock.
To avoid having the other threads exit the guest unnecessarily,
we add a real-mode implementation of H_CONFER that checks whether
the other threads are doing anything. If all the other threads
are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER),
it returns H_TOO_HARD which causes a guest exit and allows the
H_CONFER to be handled in virtual mode.
Otherwise it spins for a short time (up to 10 microseconds) to give
other threads the chance to observe that this thread is trying to
confer. The spin loop also terminates when any thread exits the guest
or when all other threads are idle or trying to confer. If the
timeout is reached, the H_CONFER returns H_SUCCESS. In this case the
guest VCPU will recheck the spinlock word and most likely call
H_CONFER again.
This also improves the implementation of the H_CONFER virtual mode
handler. If the VCPU is part of a virtual core (vcore) which is
runnable, there will be a 'runner' VCPU which has taken responsibility
for running the vcore. In this case we yield to the runner VCPU
rather than the target VCPU.
We also introduce a check on the target VCPU's yield count: if it
differs from the yield count passed to H_CONFER, the target VCPU
has run since H_CONFER was called and may have already released
the lock. This check is required by PAPR.
Paul Mackerras [Wed, 3 Dec 2014 02:30:39 +0000 (13:30 +1100)]
KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
There are two ways in which a guest instruction can be obtained from
the guest in the guest exit code in book3s_hv_rmhandlers.S. If the
exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
instruction), the offending instruction is in the HEIR register
(Hypervisor Emulation Instruction Register). If the exit was caused
by a load or store to an emulated MMIO device, we load the instruction
from the guest by turning data relocation on and loading the instruction
with an lwz instruction.
Unfortunately, in the case where the guest has opposite endianness to
the host, these two methods give results of different endianness, but
both get put into vcpu->arch.last_inst. The HEIR value has been loaded
using guest endianness, whereas the lwz will load the instruction using
host endianness. The rest of the code that uses vcpu->arch.last_inst
assumes it was loaded using host endianness.
To fix this, we define a new vcpu field to store the HEIR value. Then,
in kvmppc_handle_exit_hv(), we transfer the value from this new field to
vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
differ.
Paul Mackerras [Wed, 3 Dec 2014 02:30:38 +0000 (13:30 +1100)]
KVM: PPC: Book3S HV: Remove code for PPC970 processors
This removes the code that was added to enable HV KVM to work
on PPC970 processors. The PPC970 is an old CPU that doesn't
support virtualizing guest memory. Removing PPC970 support also
lets us remove the code for allocating and managing contiguous
real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
case, the code for pinning pages of guest memory when first
accessed and keeping track of which pages have been pinned, and
the code for handling H_ENTER hypercalls in virtual mode.
Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
The KVM_CAP_PPC_RMA capability now always returns 0.
KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
This patch adds trace points in the guest entry and exit code and also
for exceptions handled by the host in kernel mode - hypercalls and page
faults. The new events are added to /sys/kernel/debug/tracing/events
under a new subsystem called kvm_hv.
Paul Mackerras [Thu, 4 Dec 2014 05:43:28 +0000 (16:43 +1100)]
KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
Currently the calculations of stolen time for PPC Book3S HV guests
uses fields in both the vcpu struct and the kvmppc_vcore struct. The
fields in the kvmppc_vcore struct are protected by the
vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for
running the virtual core. This works correctly but confuses lockdep,
because it sees that the code takes the tbacct_lock for a vcpu in
kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in
vcore_stolen_time(), and it thinks there is a possibility of deadlock,
causing it to print reports like this:
=============================================
[ INFO: possible recursive locking detected ] 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted
---------------------------------------------
qemu-system-ppc/6188 is trying to acquire lock:
(&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
but task is already holding lock:
(&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
other info that might help us debug this:
Possible unsafe locking scenario:
In order to make the locking easier to analyse, we change the code to
use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and
preempt_tb fields. This lock needs to be an irq-safe lock since it is
used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv()
functions, which are called with the scheduler rq lock held, which is
an irq-safe lock.
When running in non-cache coherent configuration the memory that was
allocated with dma_alloc_coherent() has a custom mapping and so there is no
1-to-1 relationship between the kernel virtual address and the PFN. This
means that virt_to_pfn() will not work correctly for those addresses and the
default mmap implementation in the form of dma_common_mmap() will map some
random, but not the requested, memory area.
Fix this by providing a custom mmap implementation that looks up the PFN
from the page table rather than using virt_to_pfn.
Ley Foon Tan [Wed, 17 Dec 2014 05:53:41 +0000 (13:53 +0800)]
nios2/uaccess: fix sparse errors
virtio wants to read bitwise types from userspace using get_user. At the
moment this triggers sparse errors, since the value is passed through an
integer.
Hans Verkuil [Tue, 2 Dec 2014 15:40:33 +0000 (12:40 -0300)]
[media] bq/c-qcam, w9966, pms: move to staging in preparation for removal
These drivers haven't been tested in a long, long time. The hardware is
ancient and hopelessly obsolete. These drivers also need to be converted
to newer media frameworks but due to the lack of hardware that's going
to be impossible. In addition, cheaper and vastly better hardware is
available today.
So these drivers are a prime candidate for removal. If someone is
interested in working on these drivers to prevent their removal, then
please contact the linux-media mailinglist.
Let's be honest, the age of parallel port webcams and ISA video capture
boards is really gone.
Hans Verkuil [Tue, 2 Dec 2014 15:40:32 +0000 (12:40 -0300)]
[media] tlg2300: move to staging in preparation for removal
This driver hasn't been tested in a long, long time. The company that made
this chip has gone bust many years ago and hardware using this chip is next
to impossible to find.
This driver needs to be converted to newer media frameworks but due to the
lack of hardware that's going to be impossible. Since cheap alternatives are
easily available, there is little point in keeping this driver alive.
In other words, this driver is a prime candidate for removal. If someone is
interested in working on this driver to prevent its removal, then please
contact the linux-media mailinglist.
Hans Verkuil [Tue, 2 Dec 2014 15:40:31 +0000 (12:40 -0300)]
[media] vino/saa7191: move to staging in preparation for removal
These drivers haven't been tested in a long, long time. The hardware is
ancient and hopelessly obsolete. These drivers also need to be converted
to newer media frameworks but due to the lack of hardware that's going
to be impossible.
So these drivers are a prime candidate for removal. If someone is
interested in working on these drivers to prevent their removal, then
please contact the linux-media mailinglist.
The start_streaming op is responsible for starting the video dma,
so it shouldn't be called anymore from the buf_queue op.
Unfortunately, this call to start_video_dma() was added to the
start_streaming op, but was forgotten to be removed from the
buf_queue op, which is where it used to be before the vb2 conversion.
Calling this function twice causes very hard to find errors: sometimes
it works, sometimes it doesn't. It took me a whole friggin' day
to track this down, and in the end it was just luck that my eye suddenly
triggered on that line.
Hans Verkuil [Mon, 8 Dec 2014 16:23:49 +0000 (13:23 -0300)]
[media] cx88: add missing alloc_ctx support
The cx88 vb2 conversion and the vb2 dma_sg improvements were developed separately and
were merged separately. Unfortunately, the patch updating drivers to the dma_sg
improvements didn't take the updated cx88 driver into account. Basically two ships
passing in the night, unaware of one another even though both ships have the same
owner, i.e. me :-)
Hans Verkuil [Fri, 5 Dec 2014 13:02:47 +0000 (10:02 -0300)]
[media] v4l2-mediabus.h: use two __u16 instead of two __u32
The ycbcr_enc and quantization fields do not need a __u32. Switch to
two __u16 types, thus preserving alignment and avoiding holes in the
struct. This makes one more __u32 available for future expansion.
Linus Torvalds [Tue, 16 Dec 2014 23:53:03 +0000 (15:53 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile #2 from Al Viro:
"Next pile (and there'll be one or two more).
The large piece in this one is getting rid of /proc/*/ns/* weirdness;
among other things, it allows to (finally) make nameidata completely
opaque outside of fs/namei.c, making for easier further cleanups in
there"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coda_venus_readdir(): use file_inode()
fs/namei.c: fold link_path_walk() call into path_init()
path_init(): don't bother with LOOKUP_PARENT in argument
fs/namei.c: new helper (path_cleanup())
path_init(): store the "base" pointer to file in nameidata itself
make default ->i_fop have ->open() fail with ENXIO
make nameidata completely opaque outside of fs/namei.c
kill proc_ns completely
take the targets of /proc/*/ns/* symlinks to separate fs
bury struct proc_ns in fs/proc
copy address of proc_ns_ops into ns_common
new helpers: ns_alloc_inum/ns_free_inum
make proc_ns_operations work with struct ns_common * instead of void *
switch the rest of proc_ns_operations to working with &...->ns
netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
common object embedded into various struct ....ns
Linus Torvalds [Tue, 16 Dec 2014 23:46:01 +0000 (15:46 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull isofs and reiserfs fixes from Jan Kara:
"A reiserfs and an isofs fix. They arrived after I sent you my first
pull request and I don't want to delay them unnecessarily till rc2"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
isofs: Fix infinite looping over CE entries
reiserfs: destroy allocated commit workqueue
Linus Torvalds [Tue, 16 Dec 2014 23:25:31 +0000 (15:25 -0800)]
Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"A comparatively quieter cycle for nfsd this time, but still with two
larger changes:
- RPC server scalability improvements from Jeff Layton (using RCU
instead of a spinlock to find idle threads).
- server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
Schumaker, enabling fallocate on new clients"
* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd4: fix xdr4 count of server in fs_location4
nfsd4: fix xdr4 inclusion of escaped char
sunrpc/cache: convert to use string_escape_str()
sunrpc: only call test_bit once in svc_xprt_received
fs: nfsd: Fix signedness bug in compare_blob
sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
sunrpc: convert to lockless lookup of queued server threads
sunrpc: fix potential races in pool_stats collection
sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
sunrpc: require svc_create callers to pass in meaningful shutdown routine
sunrpc: have svc_wake_up only deal with pool 0
sunrpc: convert sp_task_pending flag to use atomic bitops
sunrpc: move rq_cachetype field to better optimize space
sunrpc: move rq_splice_ok flag into rq_flags
sunrpc: move rq_dropme flag into rq_flags
sunrpc: move rq_usedeferral flag to rq_flags
sunrpc: move rq_local field to rq_flags
sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
nfsd: minor off by one checks in __write_versions()
sunrpc: release svc_pool_map reference when serv allocation fails
...
Linus Torvalds [Tue, 16 Dec 2014 22:53:01 +0000 (14:53 -0800)]
Merge tag 'iommu-config-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC/iommu configuration update from Arnd Bergmann:
"The iomm-config branch contains work from Will Deacon, quoting his
description:
This series adds automatic IOMMU and DMA-mapping configuration for
OF-based DMA masters described using the generic IOMMU devicetree
bindings. Although there is plenty of future work around splitting up
iommu_ops, adding default IOMMU domains and sorting out automatic IOMMU
group creation for the platform_bus, this is already useful enough for
people to port over their IOMMU drivers and start using the new probing
infrastructure (indeed, Marek has patches queued for the Exynos IOMMU).
The branch touches core ARM and IOMMU driver files, and the respective
maintainers (Russell King and Joerg Roedel) agreed to have the
contents merged through the arm-soc tree.
The final version was ready just before the merge window, so we ended
up delaying it a bit longer than the rest, but we don't expect to see
regressions because this is just additional infrastructure that will
get used in drivers starting in 3.20 but is unused so far"
* tag 'iommu-config-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
iommu: store DT-probed IOMMU data privately
arm: dma-mapping: plumb our iommu mapping ops into arch_setup_dma_ops
arm: call iommu_init before of_platform_populate
dma-mapping: detect and configure IOMMU in of_dma_configure
iommu: fix initialization without 'add_device' callback
iommu: provide helper function to configure an IOMMU for an of master
iommu: add new iommu_ops callback for adding an OF device
dma-mapping: replace set_arch_dma_coherent_ops with arch_setup_dma_ops
iommu: provide early initialisation hook for IOMMU drivers
Linus Torvalds [Tue, 16 Dec 2014 22:26:26 +0000 (14:26 -0800)]
Merge tag 'dt2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC DT updates part 2 from Arnd Bergmann:
"This is a follow-up to the early ARM SoC DT changes, with additional
content that has external dependencies:
- The Tegra IOMMU DT support depends on changes from the iommu tree,
plus the contents of the arm-soc drivers branch
- The MVEBU PHY support depends on changes from the phy tree
- The AT91 DT support depends on changes from the RTC and DMA-slave
trees
All of these changes just enable additional devices for existing
platforms"
* tag 'dt2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: tegra: Enable IOMMU for display controllers on Tegra124
ARM: tegra: Enable IOMMU for display controllers on Tegra114
ARM: tegra: Enable IOMMU for display controllers on Tegra30
ARM: tegra: Add memory controller support for Tegra124
ARM: tegra: Add memory controller support for Tegra114
ARM: tegra: Add memory controller support for Tegra30
ARM: tegra: Add APB_MISC_GP as a MIPI pad control bank
ARM: mvebu: add PHY support to the dts for the USB controllers on Armada 375
ARM: mvebu: add Device Tree description of USB cluster controller on Armada 375
ARM: at91/dt: at91sam9g45: add ISI node
ARM: at91/dt: enable the RTT block on the at91sam9m10g45ek board
ARM: at91/dt: enable the RTT block on the sam9g20ek board
ARM: at91/dt: add GPBR nodes
ARM: at91/dt: add RTT nodes to at91 dtsis
ARM: at91/dt: at91sam9rl: add rtc
ARM: at91: fix GPLv2 wording
ARM: at91/dt: sama5d4: add DMA support
ARM: at91/dt: sama5d4: use macro instead of numeric value
Linus Torvalds [Tue, 16 Dec 2014 22:17:36 +0000 (14:17 -0800)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are the first arm-soc bug fixes. Most of these are OMAP related
fixes for regressions or minor bugs. Aside from that, there are a few
defconfig changes for various platforms"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
iommu/exynos: Fix arm64 allmodconfig build
ARM: defconfigs: use CONFIG_CPUFREQ_DT
ARM: omap2plus_defconfig: Enable AHCI_PLATFORM driver
ARM: dts: am437x-sk-evm.dts: fix LCD timings
ARM: dts: dra7-evm: Update SMPS7 (VDD_CORE) max voltage to match DM
ARM: dts: dra7-evm: Fix typo in SMPS6 (VDD_GPU) max voltage
ARM: OMAP2+: AM43x: Add ID for ES1.2
ARM: dts: am437x-sk: fix lcd enable pin mux data
ARM: dts: Fix gpmc regression for omap 2430sdp smc91x
Revert "ARM: shmobile: multiplatform: add Audo DMAC peri peri support on defconfig"
ARM: dts: dra7: fix DSS PLL clock mux registers
ARM: dts: DRA7: wdt: Fix compatible property for watchdog node
ARM: OMAP2+: clock: remove unused function prototype
Linus Torvalds [Tue, 16 Dec 2014 22:12:33 +0000 (14:12 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Given that my availability next week is likely to be poor, here are
three arm64 fixes to resolve some issues introduced by features merged
last week. I was going to wait until -rc1, but it doesn't make much
sense to sit on fixes.
Fix some fallout introduced during the merge window:
- Build failure when PM_SLEEP is disabled but CPU_IDLE is enabled
- Compiler warning from page table dumper w/ 48-bit VAs
- Erroneous page table truncation in reported dump"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mm: dump: don't skip final region
arm64: mm: dump: fix shift warning
arm64: psci: Fix build breakage without PM_SLEEP
Linus Torvalds [Tue, 16 Dec 2014 22:08:53 +0000 (14:08 -0800)]
Merge tag 'xtensa-next-20141215' of git://github.com/czankel/xtensa-linux
Pull Xtensa fixes from Chris Zankel:
- fix nommu support
- remove s6000 variant and s6105 platform
- fix permissions for kmapped pages so that copy_to_user_page works with them
- add power management menu to Kconfig to allow use of runtime PM
- disable linker optimizations because of a linker bug
- fix sparse error
* tag 'xtensa-next-20141215' of git://github.com/czankel/xtensa-linux:
xtensa: disable link optimization
xtensa/uaccess: fix sparse errors
xtensa: fix kmap_prot definition
xtensa: add power management menu to Kconfig
xtensa: remove s6000 variant and s6105 platform
xtensa: make PLATFORM_DEFAULT_MEM parameters configurable
xtensa: nommu: clean up memory map dump
xtensa: nommu: reserve memory below PLATFORM_DEFAULT_MEM_START
xtensa: nommu: set up cache and atomctl in initialize_mmu
xtensa: move vecbase SR initialization to _startup
xtensa: nommu: fix uImage load address
xtensa: nommu: fix load address definitions
xtensa: nommu: fix Image.elf reset code and ld script
xtensa: nommu: add MMU dependency to DEBUG_TLB_SANITY
xtensa: nommu: don't build most of the cache flushing code
xtensa: nommu: don't provide arch_get_unmapped_area
xtensa: nommu: provide MAP_UNINITIALIZED definition
xtensa: nommu: provide _PAGE_CHG_MASK definition
xtensa: nommu: provide __invalidate_dcache_page_alias stub
xtensa: nommu: move init_mmu stub to nommu_context.h
Pull arch/tile updates from Chris Metcalf:
"Note that one of the changes converts my old [email protected] email
in MAINTAINERS to the [email protected] email that you see on this
email"
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: update MAINTAINERS email to EZchip
tile: avoid undefined behavior with regs[TREG_TP] etc
arch: tile: kernel: kgdb.c: Use memcpy() instead of pointer copy one by one
tile: Use the more common pr_warn instead of pr_warning
arch: tile: gxio: Export symbols for module using in 'mpipe.c'
arch: tile: kernel: signal.c: Use __copy_from/to_user() instead of __get/put_user()
Linus Torvalds [Tue, 16 Dec 2014 21:23:03 +0000 (13:23 -0800)]
Merge tag 'stable/for-linus-3.19-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull additional xen update from David Vrabel:
"Xen: additional features for 3.19-rc0
- Linear p2m for x86 PV guests which simplifies the p2m code,
improves performance and will allow for > 512 GB PV guests in the
future.
A last-minute, configuration specific issue was discovered with this
change which is why it was not included in my previous pull request.
This is now been fixed and tested"
* tag 'stable/for-linus-3.19-rc0b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: switch to post-init routines in xen mmu.c earlier
Revert "swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single"
xen: annotate xen_set_identity_and_remap_chunk() with __init
xen: introduce helper functions to do safe read and write accesses
xen: Speed up set_phys_to_machine() by using read-only mappings
xen: switch to linear virtual mapped sparse p2m list
xen: Hide get_phys_to_machine() to be able to tune common path
x86: Introduce function to get pmd entry pointer
xen: Delay invalidating extra memory
xen: Delay m2p_override initialization
xen: Delay remapping memory of pv-domain
xen: use common page allocation function in p2m.c
xen: Make functions static
xen: fix some style issues in p2m.c
Linus Torvalds [Tue, 16 Dec 2014 21:15:12 +0000 (13:15 -0800)]
Merge tag 'linux-kselftest-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
"kselftest updates for 3.19-rc1:
- kcmp test include file cleanup
- kcmp change to build on all architectures
- A light weight kselftest framework that provides a set of
interfaces for tests to use to report results. In addition,
several tests are updated to use the framework.
- A new runtime system size test that prints the amount of RAM that
the currently running system is using"
* tag 'linux-kselftest-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftest: size: Add size test for Linux kernel
selftests/kcmp: Always try to build the test
selftests/kcmp: Don't include kernel headers
kcmp: Move kcmp.h into uapi
selftests/timers: change test to use ksft framework
selftests/kcmp: change test to use ksft framework
selftests/ipc: change test to use ksft framework
selftests/breakpoints: change test to use ksft framework
selftests: add kselftest framework for uniform test reporting
selftests/user: move test out of Makefile into a shell script
selftests/net: move test out of Makefile into a shell script
Linus Torvalds [Tue, 16 Dec 2014 20:53:59 +0000 (12:53 -0800)]
Merge tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"As the merge window is still open, and this code was not as complex as
I thought it might be. I'm pushing this in now.
This will allow Thomas to debug his irq work for 3.20.
This adds two new features:
1) Allow traceopoints to be enabled right after mm_init().
By passing in the trace_event= kernel command line parameter,
tracepoints can be enabled at boot up. For debugging things like
the initialization of interrupts, it is needed to have tracepoints
enabled very early. People have asked about this before and this
has been on my todo list. As it can be helpful for Thomas to debug
his upcoming 3.20 IRQ work, I'm pushing this now. This way he can
add tracepoints into the IRQ set up and have users enable them when
things go wrong.
2) Have the tracepoints printed via printk() (the console) when they
are triggered.
If the irq code locks up or reboots the box, having the tracepoint
output go into the kernel ring buffer is useless for debugging.
But being able to add the tp_printk kernel command line option
along with the trace_event= option will have these tracepoints
printed as they occur, and that can be really useful for debugging
early lock up or reboot problems.
This code is not that intrusive and it passed all my tests. Thomas
tried them out too and it works for his needs.
Link: http://lkml.kernel.org/r/[email protected]"
* tag 'trace-3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add tp_printk cmdline to have tracepoints go to printk()
tracing: Move enabling tracepoints to just after rcu_init()
Arnd Bergmann [Tue, 16 Dec 2014 20:43:15 +0000 (21:43 +0100)]
Merge tag 'renesas-defconfig-fixes-for-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes
Pull "Renesas ARM Based SoC Defconfig Fixes for v3.19" from Simon Horman:
* Revert change enabling RCAR_AUDMAC_PP in shmobile_defconfig
Unfortunately enabling RCAR_AUDMAC_PP support this patch breaks dmaengine
support on R-Car Gen2 boards. This should be resolved by driver updates
in v3.20. But v3.19 was too early for this defconfig change.
* tag 'renesas-defconfig-fixes-for-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
Revert "ARM: shmobile: multiplatform: add Audo DMAC peri peri support on defconfig"
Mark Brown [Mon, 15 Dec 2014 15:54:42 +0000 (15:54 +0000)]
iommu/exynos: Fix arm64 allmodconfig build
The Exynos IOMMU driver uses the ARM specific dmac_flush_range() and
outer_flush_range() functions. This breaks the build on arm64 allmodconfig
in -next since support has been merged for some Exynos ARMv8 SoCs. Add a
dependency on ARM to keep things building until either the driver has the
ARM dependencies removed or the ARMv8 architecture code implements these
ARM specific APIs.
Ido Shamay [Tue, 16 Dec 2014 11:28:54 +0000 (13:28 +0200)]
net/mlx4: Cache line CQE/EQE stride fixes
This commit contains 2 fixes for the 128B CQE/EQE stride feaure.
Wei found that mlx4_QUERY_HCA function marked the wrong capability
in flags (64B CQE/EQE), when CQE/EQE stride feature was enabled.
Also added small fix in initial CQE ownership bit assignment, when CQE
is size is not default 32B.
Nimrod Andy [Tue, 16 Dec 2014 10:25:58 +0000 (18:25 +0800)]
net: fec: Fix NAPI race
Do camera capture test on i.MX6q sabresd board, and save the capture data to
nfs rootfs. The command is:
gst-launch-1.0 -e imxv4l2src device=/dev/video1 num-buffers=2592000 ! tee name=t !
queue ! imxv4l2sink sync=false t. ! queue ! vpuenc ! queue ! mux. pulsesrc num-buffers=3720937
blocksize=4096 ! 'audio/x-raw, rate=44100, channels=2' ! queue ! imxmp3enc ! mpegaudioparse !
queue ! mux. qtmux name=mux ! filesink location=video_recording_long.mov
After about 10 hours running, there have net watchdog timeout kernel dump:
...
WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x2b4/0x2d8()
NETDEV WATCHDOG: eth0 (fec): transmit queue 0 timed out
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.24-01051-gdb840b7 #440
[<80014e6c>] (unwind_backtrace) from [<800118ac>] (show_stack+0x10/0x14)
[<800118ac>] (show_stack) from [<806ae3f0>] (dump_stack+0x78/0xc0)
[<806ae3f0>] (dump_stack) from [<8002b504>] (warn_slowpath_common+0x68/0x8c)
[<8002b504>] (warn_slowpath_common) from [<8002b558>] (warn_slowpath_fmt+0x30/0x40)
[<8002b558>] (warn_slowpath_fmt) from [<8055e0d4>] (dev_watchdog+0x2b4/0x2d8)
[<8055e0d4>] (dev_watchdog) from [<800352d8>] (call_timer_fn.isra.33+0x24/0x8c)
[<800352d8>] (call_timer_fn.isra.33) from [<800354c4>] (run_timer_softirq+0x184/0x220)
[<800354c4>] (run_timer_softirq) from [<8002f420>] (__do_softirq+0xc0/0x22c)
[<8002f420>] (__do_softirq) from [<8002f804>] (irq_exit+0xa8/0xf4)
[<8002f804>] (irq_exit) from [<8000ee5c>] (handle_IRQ+0x54/0xb4)
[<8000ee5c>] (handle_IRQ) from [<80008598>] (gic_handle_irq+0x28/0x5c)
[<80008598>] (gic_handle_irq) from [<800123c0>] (__irq_svc+0x40/0x74)
Exception stack(0x80d27f18 to 0x80d27f60)
7f00: 80d27f600000014c
7f20: 8858c60e0000004d884e45400000004dab7250d080d343480000000000000000
7f40: 00000001000000000000001780d27f60800702a480476e6c600f0013ffffffff
[<800123c0>] (__irq_svc) from [<80476e6c>] (cpuidle_enter_state+0x50/0xe0)
[<80476e6c>] (cpuidle_enter_state) from [<80476fa8>] (cpuidle_idle_call+0xac/0x154)
[<80476fa8>] (cpuidle_idle_call) from [<8000f174>] (arch_cpu_idle+0x8/0x44)
[<8000f174>] (arch_cpu_idle) from [<80064c54>] (cpu_startup_entry+0x100/0x158)
[<80064c54>] (cpu_startup_entry) from [<80cd8a9c>] (start_kernel+0x304/0x368)
---[ end trace 09ebd32fb032f86d ]---
...
There might have a race in napi_schedule(), leaving interrupts disabled forever.
After these patch, the case still work more than 40 hours running.
David Vrabel [Tue, 16 Dec 2014 18:59:46 +0000 (18:59 +0000)]
xen-netfront: use napi_complete() correctly to prevent Rx stalling
After d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less interrupt
masking in NAPI) the napi instance is removed from the per-cpu list
prior to calling the n->poll(), and is only requeued if all of the
budget was used. This inadvertently broke netfront because netfront
does not use NAPI correctly.
If netfront had not used all of its budget it would do a final check
for any Rx responses and avoid calling napi_complete() if there were
more responses. It would still return under budget so it would never
be rescheduled. The final check would also not re-enable the Rx
interrupt.
Additionally, xenvif_poll() would also call napi_complete() /after/
enabling the interrupt. This resulted in a race between the
napi_complete() and the napi_schedule() in the interrupt handler. The
use of local_irq_save/restore() avoided by race iff the handler is
running on the same CPU but not if it was running on a different CPU.
Fix both of these by always calling napi_compete() if the budget was
not all used, and then calling napi_schedule() if the final checks
says there's more work.
Thomas Graf [Tue, 16 Dec 2014 20:05:21 +0000 (21:05 +0100)]
ip_tunnel: Add missing validation of encap type to ip_tunnel_encap_setup()
The encap->type comes straight from Netlink. Validate it against
max supported encap types just like ip_encap_hlen() already does.
Fixes: a8c5f9 ("ip_tunnel: Ops registration for secondary encap (fou, gue)") Signed-off-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Thomas Graf [Tue, 16 Dec 2014 20:05:20 +0000 (21:05 +0100)]
ip_tunnel: Add sanity checks to ip_tunnel_encap_add_ops()
The symbols are exported and could be used by external modules.
Fixes: a8c5f9 ("ip_tunnel: Ops registration for secondary encap (fou, gue)") Signed-off-by: Thomas Graf <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Please pull this batch of fixes intended for the 3.19 stream!
For the Bluetooth bits, Johan says:
"The patches consist of:
- Coccinelle warning fix
- hci_dev_lock/unlock fixes
- Fixes for pending mgmt command handling
- Fixes for properly following the force_lesc_support switch
- Fix for a Microsoft branded Broadcom adapter
- New device id for Atheros AR3012
- Fix for BR/EDR Secure Connections enabling"
Along with that...
Brian Norris avoids leaking some kernel memory contents via printk in brcmsmac.
Julia Lawall corrects some misspellings in a few drivers.
Larry Finger gives us one more rtlwifi fix to correct a porting oversight.
Wei Yongjun fixes a sparse warning in rtlwifi.
Please let me know if there are problems!
====================
ifreq flags field is only 16 bit wide, so setting IFF_VNET_LE there has
no effect:
doesn't fit in two bytes.
The tests passed apparently because they have an even number of bugs,
all cancelling out.
Luckily we didn't release a kernel with this flag, so it's
not too late to fix this.
Add TUNSETVNETLE/TUNGETVNETLE to really achieve the purpose
of IFF_VNET_LE.
This has an added benefit that if we ever want a BE flag,
we won't have to deal with weird configurations like
setting both LE and BE at the same time.
flags field in ifreq is only 16 bit wide, but
we read it as a 32 bit value.
If userspace doesn't zero-initialize unused fields,
this will lead to failures.
Chris Zankel [Tue, 16 Dec 2014 05:22:21 +0000 (21:22 -0800)]
xtensa: disable link optimization
The default linker behavior is to optimize identical literal values
and remove unnecessary overhead. However, because of a bug in the
linker, this currently results in an error ('call target out of range').
Disable link-time optimizations per default until there is a fix
for the linker and add the option to iss_defconfig.
This patch series removes the bogus "select FIXED_PHY if FOO=y" that I have
been using in GENET, SYSTEMPORT and the SF2 DSA switch driver.
====================
Florian Fainelli [Mon, 15 Dec 2014 17:57:15 +0000 (09:57 -0800)]
net: dsa: bcm_sf2: always select FIXED_PHY
There is no need to do the following:
select FIXED_PHY if NET_DSA_BCM_SF2=y, as this implies that we will not be
able to build and/or run the driver correctly when built as a module,
which is no longer an issue since commit 37e9a6904520 ("net: phy: export
fixed_phy_register()").
Fixes: 246d7f773c13ca ("net: dsa: add Broadcom SF2 switch driver") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Florian Fainelli [Mon, 15 Dec 2014 17:57:14 +0000 (09:57 -0800)]
net: systemport: always select FIXED_PHY
There is no need to do the following:
select FIXED_PHY if SYSTEMPORT=y, as this implies that we will not be able
to build and/or run the driver correctly when built as a module, which
is no longer an issue since commit 37e9a6904520 ("net: phy: export
fixed_phy_register()")
Fixes: a3862db2d3c4 ("net: systemport: hook SYSTEMPORT driver in the build") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Florian Fainelli [Mon, 15 Dec 2014 17:57:13 +0000 (09:57 -0800)]
net: bcmgenet: always select FIXED_PHY
There is no need to do the following:
select FIXED_PHY if BCMGENET=y, as this implies that we will not be able
to build and/or run the driver correctly when built as a module, which
is no longer an issue since commit 37e9a6904520 ("net: phy: export
fixed_phy_register()")
Fixes: b0ba512e225d ("net: bcmgenet: enable driver to work without device tree" Fixes: bdaa53bde55f ("net: bcmgenet: hook into the build system") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
virtio wants to read bitwise types from userspace using get_user. At the
moment this triggers sparse errors, since the value is passed through an
integer.
Haggai Eran [Thu, 11 Dec 2014 15:04:26 +0000 (17:04 +0200)]
IB/mlx5: Implement on demand paging by adding support for MMU notifiers
* Implement the relevant invalidation functions (zap MTTs as needed)
* Implement interlocking (and rollback in the page fault handlers) for
cases of a racing notifier and fault.
* With this patch we can now enable the capability bits for supporting RC
send/receive/RDMA read/RDMA write, and UD send.
Haggai Eran [Thu, 11 Dec 2014 15:04:24 +0000 (17:04 +0200)]
IB/mlx5: Handle page faults
This patch implement a page fault handler (leaving the pages pinned as
of time being). The page fault handler handles initiator and responder
page faults for UD/RC transports, for send/receive operations, as well
as RDMA read/write initiator support.
Haggai Eran [Thu, 11 Dec 2014 15:04:23 +0000 (17:04 +0200)]
IB/mlx5: Page faults handling infrastructure
* Refactor MR registration and cleanup, and fix reg_pages accounting.
* Create a work queue to handle page fault events in a kthread context.
* Register a fault handler to get events from the core for each QP.
The registered fault handler is empty in this patch, and only a later
patch implements it.
Haggai Eran [Thu, 11 Dec 2014 15:04:22 +0000 (17:04 +0200)]
IB/mlx5: Add mlx5_ib_update_mtt to update page tables after creation
The new function allows updating the page tables of a memory region
after it was created. This can be used to handle page faults and page
invalidations.
Since mlx5_ib_update_mtt will need to work from within page invalidation,
so it must not block on memory allocation. It employs an atomic memory
allocation mechanism that is used as a fallback when kmalloc(GFP_ATOMIC) fails.
In order to reuse code from mlx5_ib_populate_pas, the patch splits
this function and add the needed parameters.
Haggai Eran [Thu, 11 Dec 2014 15:04:21 +0000 (17:04 +0200)]
IB/mlx5: Changes in memory region creation to support on-demand paging
This patch wraps together several changes needed for on-demand paging support
in the mlx5_ib_populate_pas function, and when registering memory regions.
* Instead of accepting a UMR bit telling the function to enable all
access flags, the function now accepts the access flags themselves.
* For on-demand paging memory regions, fill the memory tables from the
correct list, and enable/disable the access flags per-page according
to whether the page is present.
* A new bit is set to enable writing of access flags when using the
firmware create_mkey command.
* Disable contig pages when on-demand paging is enabled.
In addition the patch changes the UMR code to use PTR_ALIGN instead of
our own macro.
Haggai Eran [Thu, 11 Dec 2014 15:04:20 +0000 (17:04 +0200)]
IB/mlx5: Implement the ODP capability query verb
The patch adds infrastructure to query ODP capabilities in the mlx5
driver. The code will read the capabilities from the device, and
enable only those capabilities that both the driver and the device
supports. At this point ODP is not supported, so no capability is
copied from the device, but the patch exposes the global ODP device
capability bit.
Haggai Eran [Thu, 11 Dec 2014 15:04:19 +0000 (17:04 +0200)]
mlx5_core: Add support for page faults events and low level handling
* Add a handler function pointer in the mlx5_core_qp struct for page
fault events. Handle page fault events by calling the handler
function, if not NULL.
* Add on-demand paging capability query command.
* Export command for resuming QPs after page faults.
* Add various constants related to paging support.
Roland Dreier [Tue, 16 Dec 2014 02:17:17 +0000 (18:17 -0800)]
mlx5_core: Re-add MLX5_DEV_CAP_FLAG_ON_DMND_PG flag
In commit 0c7aac854f52 ("net/mlx5_core: Remove unused dev cap enum
fields"), the flag MLX5_DEV_CAP_FLAG_ON_DMND_PG was removed.
Unfortunately the on-demand paging changes actually use it, so re-add
the missing flag.
Haggai Eran [Thu, 11 Dec 2014 15:04:18 +0000 (17:04 +0200)]
IB/core: Implement support for MMU notifiers regarding on demand paging regions
* Add an interval tree implementation for ODP umems. Create an
interval tree for each ucontext (including a count of the number of
ODP MRs in this context, semaphore, etc.), and register ODP umems in
the interval tree.
* Add MMU notifiers handling functions, using the interval tree to
notify only the relevant umems and underlying MRs.
* Register to receive MMU notifier events from the MM subsystem upon
ODP MR registration (and unregister accordingly).
* Add a completion object to synchronize the destruction of ODP umems.
* Add mechanism to abort page faults when there's a concurrent invalidation.
The way we synchronize between concurrent invalidations and page
faults is by keeping a counter of currently running invalidations, and
a sequence number that is incremented whenever an invalidation is
caught. The page fault code checks the counter and also verifies that
the sequence number hasn't progressed before it updates the umem's
page tables. This is similar to what the kvm module does.
In order to prevent the case where we register a umem in the middle of
an ongoing notifier, we also keep a per ucontext counter of the total
number of active mmu notifiers. We only enable new umems when all the
running notifiers complete.
Shachar Raindel [Thu, 11 Dec 2014 15:04:17 +0000 (17:04 +0200)]
IB/core: Add support for on demand paging regions
* Extend the umem struct to keep the ODP related data.
* Allocate and initialize the ODP related information in the umem
(page_list, dma_list) and freeing as needed in the end of the run.
* Store a reference to the process PID struct in the ucontext. Used to
safely obtain the task_struct and the mm during fault handling,
without preventing the task destruction if needed.
* Add 2 helper functions: ib_umem_odp_map_dma_pages and
ib_umem_odp_unmap_dma_pages. These functions get the DMA addresses
of specific pages of the umem (and, currently, pin them).
* Support for page faults only - IB core will keep the reference on
the pages used and call put_page when freeing an ODP umem
area. Invalidations support will be added in a later patch.
Sagi Grimberg [Thu, 11 Dec 2014 15:04:16 +0000 (17:04 +0200)]
IB/core: Add flags for on demand paging support
* Add a configuration option for enable on-demand paging support in
the infiniband subsystem (CONFIG_INFINIBAND_ON_DEMAND_PAGING). In a
later patch, this configuration option will select the MMU_NOTIFIER
configuration option to enable mmu notifiers.
* Add a flag for on demand paging (ODP) support in the IB device capabilities.
* Add a flag to request ODP MR in the access flags to reg_mr.
* Fail registrations done with the ODP flag when the low-level driver
doesn't support this.
* Change the conditions in which an MR will be writable to explicitly
specify the access flags. This is to avoid making an MR writable just
because it is an ODP MR.
* Add a ODP capabilities to the extended query device verb.
Eli Cohen [Thu, 11 Dec 2014 15:04:15 +0000 (17:04 +0200)]
IB/core: Add support for extended query device caps
Add extensible query device capabilities verb to allow adding new features.
ib_uverbs_ex_query_device is added and copy_query_dev_fields is used to
copy capability fields to be used by both ib_uverbs_query_device and
ib_uverbs_ex_query_device.
Haggai Eran [Thu, 11 Dec 2014 15:04:14 +0000 (17:04 +0200)]
IB/mlx5: Add function to read WQE from user-space
Add a helper function mlx5_ib_read_user_wqe to read information from
user-space owned work queues. The function will be used in a later
patch by the page-fault handling code in mlx5_ib.
Signed-off-by: Haggai Eran <[email protected]>
[ Add stub for ib_umem_copy_from() for CONFIG_INFINIBAND_USER_MEM=n
- Roland ]
Haggai Eran [Thu, 11 Dec 2014 15:04:11 +0000 (17:04 +0200)]
IB/mlx5: Enhance UMR support to allow partial page table update
The current UMR interface doesn't allow partial updates to a memory
region's page tables. This patch changes the interface to allow that.
It also changes the way the UMR operation validates the memory
region's state. When set, IB_SEND_UMR_FAIL_IF_FREE will cause the UMR
operation to fail if the MKEY is in the free state. When it is
unchecked the operation will check that it isn't in the free state.
Haggai Eran [Thu, 11 Dec 2014 15:04:10 +0000 (17:04 +0200)]
IB/mlx5: Remove per-MR pas and dma pointers
Since UMR code now uses its own context struct on the stack, the pas
and dma pointers for the UMR operation that remained in the mlx5_ib_mr
struct are not necessary. This patch removes them.
Fixes: a74d24168d2d ("IB/mlx5: Refactor UMR to have its own context struct") Signed-off-by: Haggai Eran <[email protected]> Signed-off-by: Roland Dreier <[email protected]>