Linus Torvalds [Wed, 17 Dec 2014 20:31:40 +0000 (12:31 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace related fixes from Eric Biederman:
"As these are bug fixes almost all of thes changes are marked for
backporting to stable.
The first change (implicitly adding MNT_NODEV on remount) addresses a
regression that was created when security issues with unprivileged
remount were closed. I go on to update the remount test to make it
easy to detect if this issue reoccurs.
Then there are a handful of mount and umount related fixes.
Then half of the changes deal with the a recently discovered design
bug in the permission checks of gid_map. Unix since the beginning has
allowed setting group permissions on files to less than the user and
other permissions (aka ---rwx---rwx). As the unix permission checks
stop as soon as a group matches, and setgroups allows setting groups
that can not later be dropped, results in a situtation where it is
possible to legitimately use a group to assign fewer privileges to a
process. Which means dropping a group can increase a processes
privileges.
The fix I have adopted is that gid_map is now no longer writable
without privilege unless the new file /proc/self/setgroups has been
set to permanently disable setgroups.
The bulk of user namespace using applications even the applications
using applications using user namespaces without privilege remain
unaffected by this change. Unfortunately this ix breaks a couple user
space applications, that were relying on the problematic behavior (one
of which was tools/selftests/mount/unprivileged-remount-test.c).
To hopefully prevent needing a regression fix on top of my security
fix I rounded folks who work with the container implementations mostly
like to be affected and encouraged them to test the changes.
> So far nothing broke on my libvirt-lxc test bed. :-)
> Tested with openSUSE 13.2 and libvirt 1.2.9.
> Tested-by: Richard Weinberger <[email protected]>
> Tested on Fedora20 with libvirt 1.2.11, works fine.
> Tested-by: Chen Hanxiao <[email protected]>
> Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
> Just to be sure I was testing the right thing I also tested using
> my unprivileged nsexec testcases, and they failed on setgroup/setgid
> as now expected, and succeeded there without your patches.
> Tested-by: Serge Hallyn <[email protected]>
> I tested this with Sandstorm. It breaks as is and it works if I add
> the setgroups thing.
> Tested-by: Andy Lutomirski <[email protected]> # breaks things as designed :("
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Unbreak the unprivileged remount tests
userns; Correct the comment in map_write
userns: Allow setting gid_maps without privilege when setgroups is disabled
userns: Add a knob to disable setgroups on a per user namespace basis
userns: Rename id_map_mutex to userns_state_mutex
userns: Only allow the creator of the userns unprivileged mappings
userns: Check euid no fsuid when establishing an unprivileged uid mapping
userns: Don't allow unprivileged creation of gid mappings
userns: Don't allow setgroups until a gid mapping has been setablished
userns: Document what the invariant required for safe unprivileged mappings.
groups: Consolidate the setgroups permission checks
mnt: Clear mnt_expire during pivot_root
mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
umount: Do not allow unmounting rootfs.
umount: Disallow unprivileged mount force
mnt: Update unprivileged remount test
mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
Marcel Holtmann [Wed, 17 Dec 2014 17:18:08 +0000 (18:18 +0100)]
Bluetooth: Fix bug with filter in service discovery optimization
The optimization for filtering out extended inquiry results, advertising
reports or scan response data based on provided UUID list has a logic
bug. In case no match is found in the advertising data, the scan
response is ignored and not checked against the filter. This will lead
to events being filtered wrongly.
Change the code to actually only drop the events when the scan response
data is not present. If it is present, it needs to be checked against
the provided filter.
The patch is a bit more complex than it needs to be. That is because
it also fixes this compiler warning that some gcc versions produce.
CC net/bluetooth/mgmt.o
net/bluetooth/mgmt.c: In function ‘mgmt_device_found’:
net/bluetooth/mgmt.c:7028:7: warning: ‘match’ may be used uninitialized in this function [-Wmaybe-uninitialized]
bool match;
^
It seems that gcc can not clearly figure out the context of the match
variable. So just change the branches for the extended inquiry response
and advertising data around so that it is clear.
Dave Hansen reports that commit fb7332a9fedf ("mmu_gather: move minimal
range calculations into generic code") caused a performance problem:
"tlb_finish_mmu() goes up about 9x in the profiles (~0.4%->3.6%) and
tlb_flush_mmu_free() takes about 3.1% of CPU time with the patch
applied, but does not show up at all on the commit before"
and the reason is that Will moved the test for whether we need to flush
from tlb_flush_mmu() into tlb_flush_mmu_tlbonly(). But that meant that
tlb_flush_mmu_free() basically lost that check.
Move it back into tlb_flush_mmu() where it belongs, so that it covers
both tlb_flush_mmu_tlbonly() _and_ tlb_flush_mmu_free().
Linus Torvalds [Wed, 17 Dec 2014 19:52:37 +0000 (11:52 -0800)]
x86: mm: fix VM_FAULT_RETRY handling
My commit 26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling")
had a really stupid typo: the FAULT_FLAG_USER bit is in the 'flags'
variable, not the 'fault' variable. Duh,
The one silver lining in this is that Dave finding this at least
confirms that trinity actually triggers this special path easily, in a
way normal use does not.
Linus Torvalds [Wed, 17 Dec 2014 18:44:22 +0000 (10:44 -0800)]
Merge tag 'vfio-v3.19-rc1' of git://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson:
- s390 support (Frank Blaschka)
- Enable iommu-type1 for ARM SMMU (Will Deacon)
* tag 'vfio-v3.19-rc1' of git://github.com/awilliam/linux-vfio:
drivers/vfio: allow type-1 IOMMU instantiation on top of an ARM SMMU
vfio: make vfio run on s390
Linus Torvalds [Wed, 17 Dec 2014 18:37:56 +0000 (10:37 -0800)]
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"A balloon enhancement, and a minor race-on-module-unload theoretical
bug which doesn't merit cc: stable.
All the exciting stuff went via MST this cycle"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_balloon: free some memory from balloon on OOM
virtio_balloon: return the amount of freed memory from leak_balloon()
virtio_blk: fix race at module removal
virtio: Fix comment typo 'CONFIG_S_FAILED'
Linus Torvalds [Wed, 17 Dec 2014 18:16:27 +0000 (10:16 -0800)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management update from Zhang Rui:
"Summary:
- of-thermal extension to allow drivers to register and use its
functionality in a better way, without exploiting thermal core.
From Lukasz Majewski.
- Fix a bug in intel_soc_dts_thermal driver which calls a sleep
function in interrupt handler. From Maurice Petallo.
- add a thermal UAPI header file for exporting the thermal generic
netlink information to user-space. From Florian Fainelli.
- First round of refactoring in Exynos driver. Bartlomiej and Lukasz
are attempting to make it lean and easier to understand.
- New thermal driver for Rockchip (rk3288), with support for DT
thermal. From Caesar Wang.
- New thermal driver for Nvidia, Tegra124 SOCTHERM driver, with
support for DT thermal. From Mikko Perttunen.
- New cooling device, based on common clock framework. From Eduardo
Valentin.
- a couple of small fixes in thermal core framework. From Srinivas
Pandruvada, Javi Merino, Luis Henriques.
- Dropping Armada A375-Z1 SoC thermal support as the chip is not in
the market, armada folks decided to drop its support.
- a couple of small fixes and cleanups in int340x thermal driver"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (58 commits)
thermal: provide an UAPI header file
Thermal/int340x: Clear the error value of the last acpi_bus_get_device() call
thermal/powerclamp: add id for braswell cpu
thermal: Intel SoC DTS: Don't do thermal zone update inside spin_lock
Thermal: fix platform_no_drv_owner.cocci warnings
Thermal/int340x: avoid unnecessary pointer casting
thermal: int3403: Delete a check before thermal_zone_device_unregister()
thermal/int3400: export uuids
thermal: of: Extend current of-thermal.c code to allow setting emulated temp
thermal: of: Extend of-thermal to export table of trip points
thermal: of: Rename struct __thermal_trip to struct thermal_trip
thermal: of: Extend of-thermal.c to provide check if trip point is valid
thermal: of: Extend of-thermal.c to provide number of trip points
thermal: Fix error path in thermal_init()
thermal: lock the thermal zone when switching governors
thermal: core: ignore invalid trip temperature
thermal: armada: Remove support for A375-Z1 SoC
thermal: rockchip: add driver for thermal
dt-bindings: document Rockchip thermal
thermal: exynos: remove exynos_tmu_data.h include
...
Linus Torvalds [Wed, 17 Dec 2014 18:10:51 +0000 (10:10 -0800)]
Merge tag 'pwm/for-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
Pull pwm updates from Thierry Reding:
"There are two new drivers, one for the BCM2835 (Raspberry Pi) and one
used in conjunction with the LCD controller on various Atmel SoCs.
The Samsung PWM driver can now be built for 64-bit ARM (Exynos7).
A couple of fixes have been applied to the FTM PWM driver and system
sleep support was added"
* tag 'pwm/for-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: atmel-hlcdc: add at91sam9x5 and sama5d3 errata handling
pwm: ftm: Add Power Management support for FTM PWM
pwm: ftm: Add regmap rbtree type cache support
pwm: ftm: Correctly track usage count
pwm: samsung: Allow Samsung PWM driver to be enabled on Exynos7
pwm: add DT bindings documentation for atmel-hlcdc-pwm driver
pwm: add support for atmel-hlcdc-pwm device
pwm: Add BCM2835 PWM driver
Linus Torvalds [Wed, 17 Dec 2014 18:06:02 +0000 (10:06 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input subsystem updates from Dmitry Torokhov:
"Two new drivers for Elan hardware (for I2C touchpad and touchscreen
found in several Chromebooks and other devices), a driver for Goodix
touch panel, and small fixes to Cypress I2C trackpad and other input
drivers.
Also we switched to use __maybe_unused instead of gating suspend/
resume code with #ifdef guards to get better compile coverage"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (27 commits)
Input: gpio_keys - fix warning regarding uninitialized 'button' variable
Input: add support for Elan eKTH I2C touchscreens
Input: gpio_keys - fix warning regarding uninitialized 'irq' variable
Input: cyapa - use 'error' for error codes
Input: cyapa - fix resuming the device
Input: gpio_keys - add device tree support for interrupt only keys
Input: amikbd - allocate temporary keymap buffer on the stack
Input: amikbd - fix build if !CONFIG_HW_CONSOLE
Input: lm8323 - missing error check in lm8323_set_disable()
Input: initialize device counter variables with -1
Input: initialize input_no to -1 to avoid subtraction
Input: i8042 - do not try to load on Intel NUC D54250WYK
Input: atkbd - correct MSC_SCAN events for force_release keys
Input: cyapa - switch to using managed resources
Input: lifebook - use "static inline" instead of "inline" in lifebook.h
Input: touchscreen - use __maybe_unused instead of ifdef around suspend/resume
Input: mouse - use __maybe_unused instead of ifdef around suspend/resume
Input: misc - use __maybe_unused instead of ifdef around suspend/resume
Input: cap11xx - support for irq-active-high option
Input: cap11xx - add support for various cap11xx devices
...
Linus Torvalds [Wed, 17 Dec 2014 17:59:26 +0000 (09:59 -0800)]
Merge tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd
Pull MTD updates from Brian Norris:
"Summary:
- Add device tree support for DoC3
- SPI NOR:
Refactoring, for better layering between spi-nor.c and its
driver users (e.g., m25p80.c)
New flash device support
Support 6-byte ID strings
- NAND:
New NAND driver for Allwinner SoC's (sunxi)
GPMI NAND: add support for raw (no ECC) access, for testing
purposes
Add ATO manufacturer ID
A few odd driver fixes
- MTD tests:
Allow testers to compensate for OOB bitflips in oobtest
Fix a torturetest regression
- nandsim: Support longer ID byte strings
And more"
* tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd: (63 commits)
mtd: tests: abort torturetest on erase errors
mtd: physmap_of: fix potential NULL dereference
mtd: spi-nor: allow NULL as chip name and try to auto detect it
mtd: nand: gpmi: add raw oob access functions
mtd: nand: gpmi: add proper raw access support
mtd: nand: gpmi: add gpmi_copy_bits function
mtd: spi-nor: factor out write_enable() for erase commands
mtd: spi-nor: add support for s25fl128s
mtd: spi-nor: remove the jedec_id/ext_id
mtd: spi-nor: add id/id_len for flash_info{}
mtd: nand: correct the comment of function nand_block_isreserved()
jffs2: Drop bogus if in comment
mtd: atmel_nand: replace memcpy32_toio/memcpy32_fromio with memcpy
mtd: cafe_nand: drop duplicate .write_page implementation
mtd: m25p80: Add support for serial flash Spansion S25FL132K
MTD: m25p80: fix inconsistency in m25p_ids compared to spi_nor_ids
mtd: spi-nor: improve wait-till-ready timeout loop
mtd: delete unnecessary checks before two function calls
mtd: nand: omap: Fix NAND enumeration on 3430 LDP
mtd: nand: add ATO manufacturer info
...
Linus Torvalds [Wed, 17 Dec 2014 17:52:49 +0000 (09:52 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem fixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
KEYS: remove a bogus NULL check
ima: Fix build failure on powerpc when TCG_IBMVTPM dependencies are not met
KEYS: Fix stale key registration at error path
Linus Torvalds [Wed, 17 Dec 2014 17:41:32 +0000 (09:41 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse update from Miklos Szeredi:
"The first part makes sure we don't hold up umount with pending async
requests. In addition to being a cleanup, this is a small behavioral
change (for the better) and unlikely to break anything.
The second part prepares for a cleanup of the fuse device I/O code by
adding a helper for simple request submission, with some savings in
line numbers already realized"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: use file_inode() in fuse_file_fallocate()
fuse: introduce fuse_simple_request() helper
fuse: reduce max out args
fuse: hold inode instead of path after release
fuse: flush requests on umount
fuse: don't wake up reserved req in fuse_conn_kill()
Yan, Zheng [Fri, 14 Nov 2014 14:39:13 +0000 (22:39 +0800)]
ceph: flush inline version
After converting inline data to normal data, client need to flush
the new i_inline_version (CEPH_INLINE_NONE) to MDS. This commit makes
cap messages (sent to MDS) contain inline_version and inline_data.
Client always converts inline data to normal data before data write,
so the inline data length part is always zero.
Yan, Zheng [Fri, 14 Nov 2014 14:38:29 +0000 (22:38 +0800)]
ceph: convert inline data to normal data before data write
Before any data write, convert inline data to normal data and set
i_inline_version to CEPH_INLINE_NONE. The OSD request that saves
inline data to object contains 3 operations (CMPXATTR, WRITE and
SETXATTR). It compares a xattr named 'inline_version' to prevent
old data overwrites newer data.
Yan, Zheng [Fri, 14 Nov 2014 14:36:18 +0000 (22:36 +0800)]
ceph: sync read inline data
we can't use getattr to fetch inline data while holding Fr cap,
because it can cause deadlock. If we need to sync read inline data,
drop cap refs first, then use getattr to fetch inline data.
Yan, Zheng [Fri, 14 Nov 2014 14:10:07 +0000 (22:10 +0800)]
ceph: fetch inline data when getting Fcr cap refs
we can't use getattr to fetch inline data after getting Fcr caps,
because it can cause deadlock. The solution is try bringing inline
data to page cache when not holding any cap, and hope the inline
data page is still there after getting the Fcr caps. If the page
is still there, pin it in page cache for later IO.
Yan, Zheng [Thu, 13 Nov 2014 06:40:37 +0000 (14:40 +0800)]
libceph: specify position of extent operation
allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).
Ilya Dryomov [Fri, 21 Nov 2014 19:16:43 +0000 (22:16 +0300)]
rbd: don't treat CEPH_OSD_OP_DELETE as extent op
CEPH_OSD_OP_DELETE is not an extent op, stop treating it as such. This
sneaked in with discard patches - it's one of the three osd ops (the
other two are CEPH_OSD_OP_TRUNCATE and CEPH_OSD_OP_ZERO) that discard
is implemented with.
Yan, Zheng [Thu, 6 Nov 2014 07:09:41 +0000 (15:09 +0800)]
ceph: introduce global empty snap context
Current snaphost code does not properly handle moving inode from one
empty snap realm to another empty snap realm. After changing inode's
snap realm, some dirty pages' snap context can be not equal to inode's
i_head_snap. This can trigger BUG() in ceph_put_wrbuffer_cap_refs()
The fix is introduce a global empty snap context for all empty snap
realm. This avoids triggering the BUG() for filesystem with no snapshot.
ceph, rbd: delete unnecessary checks before two function calls
The functions ceph_put_snap_context() and iput() test whether their
argument is NULL and then return immediately. Thus the test around the
call is not needed.
This issue was detected by using the Coccinelle software.
Yan, Zheng [Wed, 22 Oct 2014 01:09:56 +0000 (18:09 -0700)]
ceph: introduce a new inode flag indicating if cached dentries are ordered
After creating/deleting/renaming file, offsets of sibling dentries may
change. So we can not use cached dentries to satisfy readdir. But we can
still use the cached dentries to conclude -ENOENT for lookup.
This patch introduces a new inode flag indicating if child dentries are
ordered. The flag is set at the same time marking a directory complete.
After creating/deleting/renaming file, we clear the flag on directory
inode. This prevents ceph_readdir() from using cached dentries to satisfy
readdir syscall.
Ilya Dryomov [Thu, 23 Oct 2014 12:32:57 +0000 (16:32 +0400)]
libceph: nuke ceph_kvfree()
Use kvfree() from linux/mm.h instead, which is identical. Also fix the
ceph_buffer comment: we will allocate with kmalloc() up to 32k - the
value of PAGE_ALLOC_COSTLY_ORDER, but that really is just an
implementation detail so don't mention it at all.
Yan, Zheng [Tue, 14 Oct 2014 02:33:35 +0000 (10:33 +0800)]
ceph: fix file lock interruption
When a lock operation is interrupted, current code sends a unlock request to
MDS to undo the lock operation. This method does not work as expected because
the unlock request can drop locks that have already been acquired.
The fix is use the newly introduced CEPH_LOCK_FCNTL_INTR/CEPH_LOCK_FLOCK_INTR
requests to interrupt blocked file lock request. These requests do not drop
locks that have alread been acquired, they only interrupt blocked file lock
request.
Takashi Iwai [Mon, 15 Dec 2014 12:47:25 +0000 (13:47 +0100)]
ALSA: hda - Add quirk for Packard Bell EasyNote MX65
Packard Bell EasyNote MX65 with AD1986A codec needs a few fixups,
namely, the pin config overrides to set only the known I/O pins and
the EAPD has to be turned on. In addition, add stereo mix input
forcibly for avoiding the weird KDE behavior by this update.
Mitchell Krome [Tue, 16 Dec 2014 02:16:12 +0000 (12:16 +1000)]
perf symbols: Fix use after free in filename__read_build_id
In filename__read_build_id, phdr points to memory in buf, which gets realloced
before a call to fseek that uses phdr->p_offset. This change stores the value
of p_offset before buf is realloced, so the fseek can use the value safely.
perf tools: Make the mmap length autotuning more robust
If /proc/sys/kernel/perf_event_mlock_kb is not (power of 2 + PAGE_SIZE_in_kb)
and we let the perf tools do mmap length autosizing based on that, then, for
non-CAP_IPC_LOCK users when /proc/sys/kernel/perf_event_paranoid is > -1, then
we get an -EINVAL that ends up in:
[acme@ssdandy linux]$ trace usleep 1
Invalid argument
[acme@ssdandy linux]$ perf record usleep 1
failed to mmap with 22 (Invalid argument)
Will be used to make sure we pass a power of two when automatically
setting up the perf_mmap addr range length, as the kernel code
validating input on /proc/sys/kernel/perf_event_mlock_kb accepts any
integer, if we plain use it to set up the mmap lenght, we may get an
EINVAL when passing a non power of two.
Dmitry V. Levin [Tue, 16 Dec 2014 03:59:37 +0000 (06:59 +0300)]
vfs: make mounts and mountstats honor root dir like mountinfo does
As we already show mountpoints relative to the root directory, thanks
to the change made back in 2000, change show_vfsmnt() and show_vfsstat()
to skip out-of-root mountpoints the same way as show_mountinfo() does.
Dmitry V. Levin [Wed, 17 Oct 2012 16:29:36 +0000 (20:29 +0400)]
vfs: cleanup show_mountinfo
Starting with commit v3.2-rc4-1-g02125a8, seq_path_root() no longer
changes the value of its "struct path *root" argument.
Starting with commit v3.2-rc7-104-g8c9379e, the "struct path *root"
argument of seq_path_root() is const.
As result, the temporary variable "root" in show_mountinfo() that
holds a copy of struct path root is no longer needed.
Miklos Szeredi [Thu, 20 Nov 2014 15:08:59 +0000 (16:08 +0100)]
init: fix read-write root mount
If mount flags don't have MS_RDONLY, iso9660 returns EACCES without actually
checking if it's an iso image.
This tricks mount_block_root() into retrying with MS_RDONLY. This results
in a read-only root despite the "rw" boot parameter if the actual
filesystem was checked after iso9660.
I believe the behavior of iso9660 is okay, while that of mount_block_root()
is not. It should rather try all types without MS_RDONLY and only then
retry with MS_RDONLY.
This change also makes the code more robust against the case when EACCES is
returned despite MS_RDONLY, which would've resulted in a lockup.
scanarg(s, del) never returns s; the empty field results in s + 1.
Restore the correct checks, and move NUL-termination into scanarg(),
while we are at it.
Incidentally, mixing "coding style cleanups" (for small values of cleanup)
with functional changes is a Bad Idea(tm)...
Johannes Berg [Wed, 17 Dec 2014 12:55:49 +0000 (13:55 +0100)]
mac80211: free management frame keys when removing station
When writing the code to allow per-station GTKs, I neglected to
take into account the management frame keys (index 4 and 5) when
freeing the station and only added code to free the first four
data frame keys.
Fix this by iterating the array of keys over the right length.
Currently the H_CONFER hcall is implemented in kernel virtual mode,
meaning that whenever a guest thread does an H_CONFER, all the threads
in that virtual core have to exit the guest. This is bad for
performance because it interrupts the other threads even if they
are doing useful work.
The H_CONFER hcall is called by a guest VCPU when it is spinning on a
spinlock and it detects that the spinlock is held by a guest VCPU that
is currently not running on a physical CPU. The idea is to give this
VCPU's time slice to the holder VCPU so that it can make progress
towards releasing the lock.
To avoid having the other threads exit the guest unnecessarily,
we add a real-mode implementation of H_CONFER that checks whether
the other threads are doing anything. If all the other threads
are idle (i.e. in H_CEDE) or trying to confer (i.e. in H_CONFER),
it returns H_TOO_HARD which causes a guest exit and allows the
H_CONFER to be handled in virtual mode.
Otherwise it spins for a short time (up to 10 microseconds) to give
other threads the chance to observe that this thread is trying to
confer. The spin loop also terminates when any thread exits the guest
or when all other threads are idle or trying to confer. If the
timeout is reached, the H_CONFER returns H_SUCCESS. In this case the
guest VCPU will recheck the spinlock word and most likely call
H_CONFER again.
This also improves the implementation of the H_CONFER virtual mode
handler. If the VCPU is part of a virtual core (vcore) which is
runnable, there will be a 'runner' VCPU which has taken responsibility
for running the vcore. In this case we yield to the runner VCPU
rather than the target VCPU.
We also introduce a check on the target VCPU's yield count: if it
differs from the yield count passed to H_CONFER, the target VCPU
has run since H_CONFER was called and may have already released
the lock. This check is required by PAPR.
Paul Mackerras [Wed, 3 Dec 2014 02:30:39 +0000 (13:30 +1100)]
KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
There are two ways in which a guest instruction can be obtained from
the guest in the guest exit code in book3s_hv_rmhandlers.S. If the
exit was caused by a Hypervisor Emulation interrupt (i.e. an illegal
instruction), the offending instruction is in the HEIR register
(Hypervisor Emulation Instruction Register). If the exit was caused
by a load or store to an emulated MMIO device, we load the instruction
from the guest by turning data relocation on and loading the instruction
with an lwz instruction.
Unfortunately, in the case where the guest has opposite endianness to
the host, these two methods give results of different endianness, but
both get put into vcpu->arch.last_inst. The HEIR value has been loaded
using guest endianness, whereas the lwz will load the instruction using
host endianness. The rest of the code that uses vcpu->arch.last_inst
assumes it was loaded using host endianness.
To fix this, we define a new vcpu field to store the HEIR value. Then,
in kvmppc_handle_exit_hv(), we transfer the value from this new field to
vcpu->arch.last_inst, doing a byte-swap if the guest and host endianness
differ.
Paul Mackerras [Wed, 3 Dec 2014 02:30:38 +0000 (13:30 +1100)]
KVM: PPC: Book3S HV: Remove code for PPC970 processors
This removes the code that was added to enable HV KVM to work
on PPC970 processors. The PPC970 is an old CPU that doesn't
support virtualizing guest memory. Removing PPC970 support also
lets us remove the code for allocating and managing contiguous
real-mode areas, the code for the !kvm->arch.using_mmu_notifiers
case, the code for pinning pages of guest memory when first
accessed and keeping track of which pages have been pinned, and
the code for handling H_ENTER hypercalls in virtual mode.
Book3S HV KVM is now supported only on POWER7 and POWER8 processors.
The KVM_CAP_PPC_RMA capability now always returns 0.
KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
This patch adds trace points in the guest entry and exit code and also
for exceptions handled by the host in kernel mode - hypercalls and page
faults. The new events are added to /sys/kernel/debug/tracing/events
under a new subsystem called kvm_hv.
Paul Mackerras [Thu, 4 Dec 2014 05:43:28 +0000 (16:43 +1100)]
KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
Currently the calculations of stolen time for PPC Book3S HV guests
uses fields in both the vcpu struct and the kvmppc_vcore struct. The
fields in the kvmppc_vcore struct are protected by the
vcpu->arch.tbacct_lock of the vcpu that has taken responsibility for
running the virtual core. This works correctly but confuses lockdep,
because it sees that the code takes the tbacct_lock for a vcpu in
kvmppc_remove_runnable() and then takes another vcpu's tbacct_lock in
vcore_stolen_time(), and it thinks there is a possibility of deadlock,
causing it to print reports like this:
=============================================
[ INFO: possible recursive locking detected ] 3.18.0-rc7-kvm-00016-g8db4bc6 #89 Not tainted
---------------------------------------------
qemu-system-ppc/6188 is trying to acquire lock:
(&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb1fe8>] .vcore_stolen_time+0x48/0xd0 [kvm_hv]
but task is already holding lock:
(&(&vcpu->arch.tbacct_lock)->rlock){......}, at: [<d00000000ecb25a0>] .kvmppc_remove_runnable.part.3+0x30/0xd0 [kvm_hv]
other info that might help us debug this:
Possible unsafe locking scenario:
In order to make the locking easier to analyse, we change the code to
use a spinlock in the kvmppc_vcore struct to protect the stolen_tb and
preempt_tb fields. This lock needs to be an irq-safe lock since it is
used in the kvmppc_core_vcpu_load_hv() and kvmppc_core_vcpu_put_hv()
functions, which are called with the scheduler rq lock held, which is
an irq-safe lock.
When running in non-cache coherent configuration the memory that was
allocated with dma_alloc_coherent() has a custom mapping and so there is no
1-to-1 relationship between the kernel virtual address and the PFN. This
means that virt_to_pfn() will not work correctly for those addresses and the
default mmap implementation in the form of dma_common_mmap() will map some
random, but not the requested, memory area.
Fix this by providing a custom mmap implementation that looks up the PFN
from the page table rather than using virt_to_pfn.
Al Viro [Fri, 12 Dec 2014 03:40:27 +0000 (22:40 -0500)]
lustre: get rid of playing with ->fs
* removed several pieces of dead code in lustre_compat25.h
* don't open-code current_umask() (and BTW, 0755 & (S_IRWXUGO | S_ISVTX)
is better spelled as 0755)
* fix broken attempt to get the pathname by dentry - abusing d_path() for
that is simply wrong.
Ley Foon Tan [Wed, 17 Dec 2014 05:53:41 +0000 (13:53 +0800)]
nios2/uaccess: fix sparse errors
virtio wants to read bitwise types from userspace using get_user. At the
moment this triggers sparse errors, since the value is passed through an
integer.
Hans Verkuil [Tue, 2 Dec 2014 15:40:33 +0000 (12:40 -0300)]
[media] bq/c-qcam, w9966, pms: move to staging in preparation for removal
These drivers haven't been tested in a long, long time. The hardware is
ancient and hopelessly obsolete. These drivers also need to be converted
to newer media frameworks but due to the lack of hardware that's going
to be impossible. In addition, cheaper and vastly better hardware is
available today.
So these drivers are a prime candidate for removal. If someone is
interested in working on these drivers to prevent their removal, then
please contact the linux-media mailinglist.
Let's be honest, the age of parallel port webcams and ISA video capture
boards is really gone.
Hans Verkuil [Tue, 2 Dec 2014 15:40:32 +0000 (12:40 -0300)]
[media] tlg2300: move to staging in preparation for removal
This driver hasn't been tested in a long, long time. The company that made
this chip has gone bust many years ago and hardware using this chip is next
to impossible to find.
This driver needs to be converted to newer media frameworks but due to the
lack of hardware that's going to be impossible. Since cheap alternatives are
easily available, there is little point in keeping this driver alive.
In other words, this driver is a prime candidate for removal. If someone is
interested in working on this driver to prevent its removal, then please
contact the linux-media mailinglist.
Hans Verkuil [Tue, 2 Dec 2014 15:40:31 +0000 (12:40 -0300)]
[media] vino/saa7191: move to staging in preparation for removal
These drivers haven't been tested in a long, long time. The hardware is
ancient and hopelessly obsolete. These drivers also need to be converted
to newer media frameworks but due to the lack of hardware that's going
to be impossible.
So these drivers are a prime candidate for removal. If someone is
interested in working on these drivers to prevent their removal, then
please contact the linux-media mailinglist.
The start_streaming op is responsible for starting the video dma,
so it shouldn't be called anymore from the buf_queue op.
Unfortunately, this call to start_video_dma() was added to the
start_streaming op, but was forgotten to be removed from the
buf_queue op, which is where it used to be before the vb2 conversion.
Calling this function twice causes very hard to find errors: sometimes
it works, sometimes it doesn't. It took me a whole friggin' day
to track this down, and in the end it was just luck that my eye suddenly
triggered on that line.
Hans Verkuil [Mon, 8 Dec 2014 16:23:49 +0000 (13:23 -0300)]
[media] cx88: add missing alloc_ctx support
The cx88 vb2 conversion and the vb2 dma_sg improvements were developed separately and
were merged separately. Unfortunately, the patch updating drivers to the dma_sg
improvements didn't take the updated cx88 driver into account. Basically two ships
passing in the night, unaware of one another even though both ships have the same
owner, i.e. me :-)
Hans Verkuil [Fri, 5 Dec 2014 13:02:47 +0000 (10:02 -0300)]
[media] v4l2-mediabus.h: use two __u16 instead of two __u32
The ycbcr_enc and quantization fields do not need a __u32. Switch to
two __u16 types, thus preserving alignment and avoiding holes in the
struct. This makes one more __u32 available for future expansion.
Linus Torvalds [Tue, 16 Dec 2014 23:53:03 +0000 (15:53 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile #2 from Al Viro:
"Next pile (and there'll be one or two more).
The large piece in this one is getting rid of /proc/*/ns/* weirdness;
among other things, it allows to (finally) make nameidata completely
opaque outside of fs/namei.c, making for easier further cleanups in
there"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
coda_venus_readdir(): use file_inode()
fs/namei.c: fold link_path_walk() call into path_init()
path_init(): don't bother with LOOKUP_PARENT in argument
fs/namei.c: new helper (path_cleanup())
path_init(): store the "base" pointer to file in nameidata itself
make default ->i_fop have ->open() fail with ENXIO
make nameidata completely opaque outside of fs/namei.c
kill proc_ns completely
take the targets of /proc/*/ns/* symlinks to separate fs
bury struct proc_ns in fs/proc
copy address of proc_ns_ops into ns_common
new helpers: ns_alloc_inum/ns_free_inum
make proc_ns_operations work with struct ns_common * instead of void *
switch the rest of proc_ns_operations to working with &...->ns
netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns
make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns
common object embedded into various struct ....ns
Linus Torvalds [Tue, 16 Dec 2014 23:46:01 +0000 (15:46 -0800)]
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull isofs and reiserfs fixes from Jan Kara:
"A reiserfs and an isofs fix. They arrived after I sent you my first
pull request and I don't want to delay them unnecessarily till rc2"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
isofs: Fix infinite looping over CE entries
reiserfs: destroy allocated commit workqueue
Linus Torvalds [Tue, 16 Dec 2014 23:25:31 +0000 (15:25 -0800)]
Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"A comparatively quieter cycle for nfsd this time, but still with two
larger changes:
- RPC server scalability improvements from Jeff Layton (using RCU
instead of a spinlock to find idle threads).
- server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
Schumaker, enabling fallocate on new clients"
* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd4: fix xdr4 count of server in fs_location4
nfsd4: fix xdr4 inclusion of escaped char
sunrpc/cache: convert to use string_escape_str()
sunrpc: only call test_bit once in svc_xprt_received
fs: nfsd: Fix signedness bug in compare_blob
sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
sunrpc: convert to lockless lookup of queued server threads
sunrpc: fix potential races in pool_stats collection
sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
sunrpc: require svc_create callers to pass in meaningful shutdown routine
sunrpc: have svc_wake_up only deal with pool 0
sunrpc: convert sp_task_pending flag to use atomic bitops
sunrpc: move rq_cachetype field to better optimize space
sunrpc: move rq_splice_ok flag into rq_flags
sunrpc: move rq_dropme flag into rq_flags
sunrpc: move rq_usedeferral flag to rq_flags
sunrpc: move rq_local field to rq_flags
sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
nfsd: minor off by one checks in __write_versions()
sunrpc: release svc_pool_map reference when serv allocation fails
...
Linus Torvalds [Tue, 16 Dec 2014 22:53:01 +0000 (14:53 -0800)]
Merge tag 'iommu-config-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC/iommu configuration update from Arnd Bergmann:
"The iomm-config branch contains work from Will Deacon, quoting his
description:
This series adds automatic IOMMU and DMA-mapping configuration for
OF-based DMA masters described using the generic IOMMU devicetree
bindings. Although there is plenty of future work around splitting up
iommu_ops, adding default IOMMU domains and sorting out automatic IOMMU
group creation for the platform_bus, this is already useful enough for
people to port over their IOMMU drivers and start using the new probing
infrastructure (indeed, Marek has patches queued for the Exynos IOMMU).
The branch touches core ARM and IOMMU driver files, and the respective
maintainers (Russell King and Joerg Roedel) agreed to have the
contents merged through the arm-soc tree.
The final version was ready just before the merge window, so we ended
up delaying it a bit longer than the rest, but we don't expect to see
regressions because this is just additional infrastructure that will
get used in drivers starting in 3.20 but is unused so far"
* tag 'iommu-config-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
iommu: store DT-probed IOMMU data privately
arm: dma-mapping: plumb our iommu mapping ops into arch_setup_dma_ops
arm: call iommu_init before of_platform_populate
dma-mapping: detect and configure IOMMU in of_dma_configure
iommu: fix initialization without 'add_device' callback
iommu: provide helper function to configure an IOMMU for an of master
iommu: add new iommu_ops callback for adding an OF device
dma-mapping: replace set_arch_dma_coherent_ops with arch_setup_dma_ops
iommu: provide early initialisation hook for IOMMU drivers
Linus Torvalds [Tue, 16 Dec 2014 22:26:26 +0000 (14:26 -0800)]
Merge tag 'dt2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC DT updates part 2 from Arnd Bergmann:
"This is a follow-up to the early ARM SoC DT changes, with additional
content that has external dependencies:
- The Tegra IOMMU DT support depends on changes from the iommu tree,
plus the contents of the arm-soc drivers branch
- The MVEBU PHY support depends on changes from the phy tree
- The AT91 DT support depends on changes from the RTC and DMA-slave
trees
All of these changes just enable additional devices for existing
platforms"
* tag 'dt2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: tegra: Enable IOMMU for display controllers on Tegra124
ARM: tegra: Enable IOMMU for display controllers on Tegra114
ARM: tegra: Enable IOMMU for display controllers on Tegra30
ARM: tegra: Add memory controller support for Tegra124
ARM: tegra: Add memory controller support for Tegra114
ARM: tegra: Add memory controller support for Tegra30
ARM: tegra: Add APB_MISC_GP as a MIPI pad control bank
ARM: mvebu: add PHY support to the dts for the USB controllers on Armada 375
ARM: mvebu: add Device Tree description of USB cluster controller on Armada 375
ARM: at91/dt: at91sam9g45: add ISI node
ARM: at91/dt: enable the RTT block on the at91sam9m10g45ek board
ARM: at91/dt: enable the RTT block on the sam9g20ek board
ARM: at91/dt: add GPBR nodes
ARM: at91/dt: add RTT nodes to at91 dtsis
ARM: at91/dt: at91sam9rl: add rtc
ARM: at91: fix GPLv2 wording
ARM: at91/dt: sama5d4: add DMA support
ARM: at91/dt: sama5d4: use macro instead of numeric value
Linus Torvalds [Tue, 16 Dec 2014 22:17:36 +0000 (14:17 -0800)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are the first arm-soc bug fixes. Most of these are OMAP related
fixes for regressions or minor bugs. Aside from that, there are a few
defconfig changes for various platforms"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
iommu/exynos: Fix arm64 allmodconfig build
ARM: defconfigs: use CONFIG_CPUFREQ_DT
ARM: omap2plus_defconfig: Enable AHCI_PLATFORM driver
ARM: dts: am437x-sk-evm.dts: fix LCD timings
ARM: dts: dra7-evm: Update SMPS7 (VDD_CORE) max voltage to match DM
ARM: dts: dra7-evm: Fix typo in SMPS6 (VDD_GPU) max voltage
ARM: OMAP2+: AM43x: Add ID for ES1.2
ARM: dts: am437x-sk: fix lcd enable pin mux data
ARM: dts: Fix gpmc regression for omap 2430sdp smc91x
Revert "ARM: shmobile: multiplatform: add Audo DMAC peri peri support on defconfig"
ARM: dts: dra7: fix DSS PLL clock mux registers
ARM: dts: DRA7: wdt: Fix compatible property for watchdog node
ARM: OMAP2+: clock: remove unused function prototype