fsl/usb: Resolve PHY_CLK_VLD instability issue for ULPI phy
For controller versions greater than 1.6, setting ULPI_PHY_CLK_SEL
bit when USB_EN bit is already set causes instability issues with
PHY_CLK_VLD bit. So USB_EN is set only for IP controller version
below 1.6 before setting ULPI_PHY_CLK_SEL bit
Mark Tinguely [Mon, 23 Sep 2013 17:18:58 +0000 (12:18 -0500)]
xfs: fix node forward in xfs_node_toosmall
Commit f5ea1100 cleans up the disk to host conversions for
node directory entries, but because a variable is reused in
xfs_node_toosmall() the next node is not correctly found.
If the original node is small enough (<= 3/8 of the node size),
this change may incorrectly cause a node collapse when it should
not. That will cause an assert in xfstest generic/319:
Assertion failed: first <= last && last < BBTOB(bp->b_length),
file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569
Keep the original node header to get the correct forward node.
(When a node is considered for a merge with a sibling, it overwrites the
sibling pointers of the original incore nodehdr with the sibling's
pointers. This leads to loop considering the original node as a merge
candidate with itself in the second pass, and so it incorrectly
determines a merge should occur.)
Signed-off-by: Mark Tinguely <[email protected]> Reviewed-by: Ben Myers <[email protected]> Signed-off-by: Ben Myers <[email protected]>
[v3: added Dave Chinner's (slightly modified) suggestion to the commit header,
cleaned up whitespace. -bpm]
Henrik Rydberg [Thu, 26 Sep 2013 06:33:16 +0000 (08:33 +0200)]
hwmon: (applesmc) Check key count before proceeding
After reports from Chris and Josh Boyer of a rare crash in applesmc,
Guenter pointed at the initialization problem fixed below. The patch
has not been verified to fix the crash, but should be applied
regardless.
of_get_display_timing(s) use of_find_node_by_name
to get child node, this is incorrect, of_get_child_by_name
should be used instead. The patch fixes it.
Small typo is also corrected.
The nice thing about devm_* is that the driver doesn't need to free the
resources but the driver core takes care about that. This also
simplifies the error path quite a bit and removes the wrong check for a
clock pointer being NULL.
Russell King - ARM Linux <[email protected]>:
"And this patch also fixes the above: disabling/unpreparing _after_ putting
the thing - which was quite silly... :)"
Mengdong Lin [Sun, 22 Sep 2013 00:34:45 +0000 (20:34 -0400)]
ALSA : hda - not use assigned converters for all unused pins
BIOS can mark a pin as "no physical connection" if the port is used by an
integrated display which is not audio capable. And audio driver will overlook
such pins.
On Haswell, such a disconneted pin will keep muted and connected to the 1st
converter by default. But if the 1st convertor is assigned to a connected pin
for audio streaming. The muted disconnected pin can make the connected pin
no sound output.
So this patch avoids using assigned converters for all unused pins for Haswell,
including the disconected pins.
ALSA: compress: Make sure we trigger STOP before closing the stream.
Currently we assume that userspace will shut down the compressed stream
correctly. However, if userspcae dies (e.g. cplay & ctrl-C) we dont
stop the stream before freeing it.
This now checks that the stream is stopped before freeing.
Peter Hurley [Thu, 26 Sep 2013 00:13:04 +0000 (20:13 -0400)]
tty: Fix SIGTTOU not sent with tcflush()
Commit 'e7f3880cd9b98c5bf9391ae7acdec82b75403776'
tty: Fix recursive deadlock in tty_perform_flush()
introduced a regression where tcflush() does not generate
SIGTTOU for background process groups.
Make sure ioctl(TCFLSH) calls tty_check_change() when
invoked from the line discipline.
Kurt Garloff [Tue, 24 Sep 2013 12:13:48 +0000 (14:13 +0200)]
usb/core/devio.c: Don't reject control message to endpoint with wrong direction bit
Trying to read data from the Pegasus Technologies NoteTaker (0e20:0101)
[1] with the Windows App (EasyNote) works natively but fails when
Windows is running under KVM (and the USB device handed to KVM).
The reason is a USB control message
usb 4-2.2: control urb: bRequestType=22 bRequest=09 wValue=0200 wIndex=0001 wLength=0008
This goes to endpoint address 0x01 (wIndex); however, endpoint address
0x01 does not exist. There is an endpoint 0x81 though (same number,
but other direction); the app may have meant that endpoint instead.
The kernel thus rejects the IO and thus we see the failure.
Apparently, Linux is more strict here than Windows ... we can't change
the Win app easily, so that's a problem.
It seems that the Win app/driver is buggy here and the driver does not
behave fully according to the USB HID class spec that it claims to
belong to. The device seems to happily deal with that though (and
seems to not really care about this value much).
So the question is whether the Linux kernel should filter here.
Rejecting has the risk that somewhat non-compliant userspace apps/
drivers (most likely in a virtual machine) are prevented from working.
Not rejecting has the risk of confusing an overly sensitive device with
such a transfer. Given the fact that Windows does not filter it makes
this risk rather small though.
The patch makes the kernel more tolerant: If the endpoint address in
wIndex does not exist, but an endpoint with toggled direction bit does,
it will let the transfer through. (It does NOT change the message.)
With attached patch, the app in Windows in KVM works.
usb 4-2.2: check_ctrlrecip: process 13073 (qemu-kvm) requesting ep 01 but needs 81
I suspect this will mostly affect apps in virtual environments; as on
Linux the apps would have been adapted to the stricter handling of the
kernel. I have done that for mine[2].
Peter Chen [Tue, 17 Sep 2013 04:37:24 +0000 (12:37 +0800)]
usb: chipidea: udc: free pending TD at removal procedure
There is a pending TD which is not freed after request finishes,
we do this due to a controller bug. This TD needs to be freed when
the driver is removed. It prints below error message when unload
chipidea driver at current code:
"ci_hdrc ci_hdrc.0: dma_pool_destroy ci_hw_td, b0001000 busy"
It indicates the buffer at dma pool are still in use.
This commit will free the pending TD at driver's removal procedure,
it can fix the problem described above.
Peter Chen [Tue, 17 Sep 2013 04:37:23 +0000 (12:37 +0800)]
usb: chipidea: imx: Add usb_phy_shutdown at probe's error path
If not, the PHY will be active even the controller is not in use.
We find this issue due to the PHY's clock refcount is not correct
due to -EPROBE_DEFER return after phy's init.
Peter Chen [Tue, 17 Sep 2013 04:37:20 +0000 (12:37 +0800)]
usb: chipidea: udc: fix the oops after rmmod gadget
When we rmmod gadget, the ci->driver needs to be cleared.
Otherwise, when we plug in usb cable again, the driver will
consider gadget is there, and go to enumeration procedure,
but in fact, it was removed.
Clocksource devices provided by DT can be disabled (status != "okay").
Instead of registering clocksource drivers for disabled nodes, respect
the device's status by skiping disabled nodes.
Tomasz Figa [Wed, 25 Sep 2013 10:00:59 +0000 (12:00 +0200)]
clocksource: exynos_mct: Set IRQ affinity when the CPU goes online
Some variants of Exynos MCT, namely exynos4210-mct at the moment, use
normal, shared interrupts for local timers. This means that each
interrupt must have correct affinity set to fire only on CPU
corresponding to given local timer.
However after recent conversion of clocksource drivers to not use the
local timer API for local timer initialization any more, the point of
time when local timers get initialized changed and irq_set_affinity()
fails because the CPU is not marked as online yet.
This patch fixes this by moving the call to irq_set_affinity() to
CPU_ONLINE notification, so the affinity is being set when the CPU goes
online.
This fixes a regression introduced by commit ee98d27df6 ARM: EXYNOS4: Divorce mct from local timer API
which rendered all Exynos4210 based boards unbootable due to
failing irq_set_affinity() making local timers inoperatible.
Alan Stern [Tue, 24 Sep 2013 19:45:25 +0000 (15:45 -0400)]
USB: fix PM config symbol in uhci-hcd, ehci-hcd, and xhci-hcd
Since uhci-hcd, ehci-hcd, and xhci-hcd support runtime PM, the .pm
field in their pci_driver structures should be protected by CONFIG_PM
rather than CONFIG_PM_SLEEP. The corresponding change has already
been made for ohci-hcd.
Without this change, controllers won't do runtime suspend if system
suspend or hibernation isn't enabled.
Alan Stern [Tue, 24 Sep 2013 19:46:45 +0000 (15:46 -0400)]
USB: OHCI: accept very late isochronous URBs
Commit 24f531371de1 (USB: EHCI: accept very late isochronous URBs)
changed the isochronous API provided by ehci-hcd. URBs submitted too
late, so that the time slots for all their packets have already
expired, are no longer rejected outright. Instead the submission is
accepted, and the URB completes normally with a -EXDEV error for each
packet. This is what client drivers expect.
This patch implements the same policy in ohci-hcd. The change is more
complicated than it was in ehci-hcd, because ohci-hcd doesn't scan for
isochronous completions in the same way as ehci-hcd does. Rather, it
depends on the hardware adding completed TDs to a "done queue". Some
OHCI controller don't handle this properly when a TD's time slot has
already expired, so we have to avoid adding such TDs to the schedule
in the first place. As a result, if the URB was submitted too late
then none of its TDs will get put on the schedule, so none of them
will end up on the done queue, so the driver will never realize that
the URB should be completed.
To solve this problem, the patch adds one to urb_priv->td_cnt for such
URBs, making it larger than urb_priv->length (td_cnt already gets set
to the number of TD's that had to be skipped because their slots have
expired). Each time an URB is given back, the finish_urb() routine
looks to see if urb_priv->td_cnt for the next URB on the same endpoint
is marked in this way. If so, it gives back the next URB right away.
This should be applied to all kernels containing commit 815fa7b91761
(USB: OHCI: fix logic for scheduling isochronous URBs).
Alan Stern [Tue, 24 Sep 2013 19:47:20 +0000 (15:47 -0400)]
USB: UHCI: accept very late isochronous URBs
Commit 24f531371de1 (USB: EHCI: accept very late isochronous URBs)
changed the isochronous API provided by ehci-hcd. URBs submitted too
late, so that the time slots for all their packets have already
expired, are no longer rejected outright. Instead the submission is
accepted, and the URB completes normally with a -EXDEV error for each
packet. This is what client drivers expect.
This patch implements the same policy in uhci-hcd. It should be
applied to all kernels containing commit c44b225077bb (UHCI: implement
new semantics for URB_ISO_ASAP).
Alan Stern [Tue, 24 Sep 2013 19:48:05 +0000 (15:48 -0400)]
USB: iMX21: accept very late isochronous URBs
Commit 24f531371de1 (USB: EHCI: accept very late isochronous URBs)
changed the isochronous API provided by ehci-hcd. URBs submitted too
late, so that the time slots for all their packets have already
expired, are no longer rejected outright. Instead the submission is
accepted, and the URB completes normally with a -EXDEV error for each
packet. This is what client drivers expect.
The same policy should be implemented in imx21-hcd, but I don't know
enough about the hardware to do it. As a second-best substitute, this
patch treats very late isochronous submissions as though the
URB_ISO_ASAP flag were set. I don't have any way to test this change,
unfortunately.
Merge tag 'for-usb-linus-2013-09-23' of git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci into usb-linus
Sarah writes:
xhci: Bug fixes for 3.12.
Hi Greg,
Here's five bug fixes for 3.12.
The first two bugs fix issues with the command cancellation handling,
which can lead to oopses or the xHCI driver attempting to handle
previously-completed transfers. People have been running into oopses
and odd behavior with command cancellation for a couple kernel releases,
so they're marked for stable.
The third patch fixes an issue with USB remote wakeup under xHCI that
can only be reproduced under ChromeOS. As discussed, this fix is not
urgent, and isn't marked for stable.
The fourth patch fixes a race condition between URB cancellation and
userspace clearing an endpoint stall. The fifth patch removes some
annoying dmesg spam when a USB 3.0 device is disconnected, by avoiding
sending a Set SEL request.
Since commit b5dc0d10 (drm/imx: kill firstopen callback) the following probe
failure is seen:
[drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[drm] No driver support for vblank timestamp query.
[drm] Initialized imx-drm 1.0.0 20120507 on minor 0
imx-ldb ldb.10: adding encoder failed with -16
imx-ldb: probe of ldb.10 failed with error -16
imx-ipuv3 2400000.ipu: IPUv3H probed
imx-ipuv3 2800000.ipu: IPUv3H probed
imx-ipuv3-crtc imx-ipuv3-crtc.0: adding crtc failed with -16.
imx-ipuv3-crtc: probe of imx-ipuv3-crtc.0 failed with error -16
imx-ipuv3-crtc imx-ipuv3-crtc.1: adding crtc failed with -16.
imx-ipuv3-crtc: probe of imx-ipuv3-crtc.1 failed with error -16
imx-ipuv3-crtc imx-ipuv3-crtc.2: adding crtc failed with -16.
imx-ipuv3-crtc: probe of imx-ipuv3-crtc.2 failed with error -16
imx-ipuv3-crtc imx-ipuv3-crtc.3: adding crtc failed with -16.
imx-ipuv3-crtc: probe of imx-ipuv3-crtc.3 failed with error -16
The reason for the probe failure is that now 'imxdrm->references' is incremented
early in imx_drm_driver_load(), so the following checks in imx_drm_add_crtc()
and imx_drm_add_encoder():
if (imxdrm->references) {
ret = -EBUSY;
goto err_busy;
}
,will always fail.
Instead of manually keeping the references in the imx-drm driver, let's use
drm->open_count.
After this patch, lvds panel is functional on a mx6qsabrelite board.
staging: vt6656: [BUG] main_usb.c oops on device_close move flag earlier.
The vt6656 is prone to resetting on the usb bus.
It seems there is a race condition and wpa supplicant is
trying to open the device via iw_handlers before its actually
closed at a stage that the buffers are being removed.
The device is longer considered open when the
buffers are being removed. So move ~DEVICE_FLAGS_OPENED
flag to before freeing the device buffers.
MAINTAINERS: staging: dgnc and dgap drivers: add maintainer
This patch adds the staging/dgnc [DIGI NEO AND CLASSIC
PCI PRODUCTS] and staging/dgap [DIGI EPCA PCI PRODUCTS]
drivers to the MAINTAINERS file. I am listed as the
maintainer and the driverdev-devel list is the mailing
list for these drivers.
Merge tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen fixes from Konrad Rzeszutek Wilk:
"Bug-fixes and one update to the kernel-paramters.txt documentation.
- Fix PV spinlocks triggering jump_label code bug
- Remove extraneous code in the tpm front driver
- Fix ballooning out of pages when non-preemptible
- Fix deadlock when using a 32-bit initial domain with large amount
of memory
- Add xen_nopvpsin parameter to the documentation"
* tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/spinlock: Document the xen_nopvspin parameter.
xen/p2m: check MFN is in range before using the m2p table
xen/balloon: don't alloc page while non-preemptible
xen: Do not enable spinlocks before jump_label_init() has executed
tpm: xen-tpmfront: Remove the locality sysfs attribute
tpm: xen-tpmfront: Fix default durations
i915: bail out earlier when shrinker cannot acquire mutex
SHRINK_STOP was added to tell the core shrinker code to bail out and
go to the next shrinker since the i915 shrinker couldn't acquire
required locks. But the SHRINK_STOP return code was added to the
->count_objects callback and not the ->scan_objects callback as it
should have been, resulting in tons of dmesg noise like
shrink_slab: i915_gem_inactive_scan+0x0/0x9c negative objects to delete nr=-xxxxxxxxx
Merge tag 'dm-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device-mapper fixes from Mike Snitzer:
"A few fixes for dm-snapshot, a 32 bit fix for dm-stats, a couple error
handling fixes for dm-multipath. A fix for the thin provisioning
target to not expose non-zero discard limits if discards are disabled.
Lastly, add two DM module parameters which allow users to tune the
emergency memory reserves that DM mainatins per device -- this helps
fix a long-standing issue for dm-multipath. The conservative default
reserve for request-based dm-multipath devices (256) has proven
problematic for users with many multipathed SCSI devices but
relatively little memory. To responsibly select a smaller value users
should use the new nr_bios tracepoint info (via commit 75afb352
"block: Add nr_bios to block_rq_remap tracepoint") to determine the
peak number of bios their workloads create"
* tag 'dm-3.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: add reserved_bio_based_ios module parameter
dm: add reserved_rq_based_ios module parameter
dm: lower bio-based mempool reservation
dm thin: do not expose non-zero discard limits if discards disabled
dm mpath: disable WRITE SAME if it fails
dm-snapshot: fix performance degradation due to small hash size
dm snapshot: workaround for a false positive lockdep warning
dm stats: fix possible counter corruption on 32-bit systems
dm mpath: do not fail path on -ENOSPC
- A small cleanup the main purpose of which is to work around an
internal compiler error bug in certain Codesource toolchains.
* git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: mm: Move some checks out of 'for' loop in DMA operations
MIPS: cpu-features.h: s/MIPS53/MIPS64/
Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
"Here are a few things for -rc2, this time it's all written by me so it
can only be perfect .... right ? :)
So we have the fix to call irq_enter/exit on the irq stack we've been
discussing, plus a cleanup on top to remove an unused (and broken)
stack limit tracking feature (well, make it 32-bit only in fact where
it is used and works properly).
Then we have two things that I wrote over the last couple of days and
made the executive decision to include just because I can (and I'm
sure you won't object .... right ?).
They fix a couple of annoying and long standing "issues":
- We had separate zImages for when booting via Open Firmware vs.
booting via a flat device-tree, while it's trivial to make one that
deals with both
- We wasted a ton of cycles spinning secondary CPUs uselessly at boot
instead of starting them when needed on pseries, thus contributing
significantly to global warming"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc/pseries: Do not start secondaries in Open Firmware
powerpc/zImage: make the "OF" wrapper support ePAPR boot
powerpc: Remove ksp_limit on ppc64
powerpc/irq: Run softirqs off the top of the irq stack
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Three small fixes"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/balancing: Fix cfs_rq->task_h_load calculation
sched/balancing: Fix 'local->avg_load > busiest->avg_load' case in fix_small_imbalance()
sched/balancing: Fix 'local->avg_load > sds->avg_load' case in calculate_imbalance()
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Assorted standalone fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Add model number for Avoton Silvermont
perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group()
perf: Update ABI comment
tools lib lk: Uninclude linux/magic.h in debugfs.c
perf tools: Fix old GCC build error in trace-event-parse.c:parse_proc_kallsyms()
perf probe: Fix finder to find lines of given function
perf session: Check for SIGINT in more loops
perf tools: Fix compile with libelf without get_phdrnum
perf tools: Fix buildid cache handling of kallsyms with kcore
perf annotate: Fix objdump line parsing offset validation
perf tools: Fill in new definitions for madvise()/mmap() flags
perf tools: Sharpen the libaudit dependencies test
My old @it.uu.se email address is going away, so update relevant
files to point to my @gmail.com address instead. In sata_promise.c
just delete the address, people can get it from MAINTAINERS.
Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes from Arnaldo Carvalho de Melo:
* It was possible to use an uninitialized buffer when reading
kernel modules information and checking if the file was a
/proc/sys/kernel/kptr_restrict'ed one, fix for this from
Adrian Hunter.
* The libbfd demangler doesn't handle cloned functions (e.g. symbol.clone.NUM),
feed it unsuffixed symbol names, workaround from Andi Kleen.
* Fix segfault in 'perf trace' when processing perf.data files with PERF_RECORD_MMAP2
records, recently added but not handled in this tool, from David Ahern.
* Fix libdl related build in old systems like Fedora 12, from David Ahern.
* Make 'perf kmem' work again on non NUMA machines, fix from Jiri Olsa.
* Fix probing symbols with optimization suffix in 'perf probe' where some
operations that are entirely user level and involves vmlinux/DWARF were working
but when the symbol name was fed to the kprobes tracer, the in kernel code
would use /proc/kallsyms where the name had the suffix, from Masami Hiramatsu.
perf probe: Fix probing symbols with optimization suffix
Fix perf probe to probe on some symbols which have some optimzation
suffixes, e.g. ".part", ".isra", and ".constprop".
To fix this issue, instead of using the DIE name, perf probe uses the
symbol name found by dwfl_module_addrsym().
This also involves a perf probe --vars operation update which now shows
the symbol name instead of the DIE name.
Without this patch, putting a probe on an inlined function which was
compiled with a suffixed symbol will fail like this:
$ perf probe -v getname_flags
probe-definition(0): getname_flags
symbol:getname_flags file:(null) line:0 offset:0 return:0 lazy:(null)
0 arguments
Looking at the vmlinux_path (6 entries long)
Using /lib/modules/3.11.0+/build/vmlinux for symbols
found inline addr: 0xffffffff8119bb70
Probe point found: getname_flags+0
found inline addr: 0xffffffff8119bcb6
Probe point found: getname+6
found inline addr: 0xffffffff811a06a6
Probe point found: user_path_at_empty+6
find 3 probe_trace_events.
Opening /sys/kernel/debug//tracing/kprobe_events write=1
Added new events:
Writing event: p:probe/getname_flags getname_flags+0
Failed to write event: No such file or directory
Error: Failed to add events. (-1)
Because the debuginfo knows only the original (non suffix) symbol name,
it uses the original symbol for probe address but the kernel (kallsyms)
knows only suffixed symbol. Then, the kernel rejects that original
symbol.
This patch uses dwfl_module_addrsym() to get the correct (suffixed)
symbol from symtab when a probe point is found.
Jayachandran C [Wed, 25 Sep 2013 13:01:05 +0000 (18:31 +0530)]
MIPS: mm: Move some checks out of 'for' loop in DMA operations
The check cpu_needs_post_dma_flush() in mips_dma_sync_sg_for_cpu() and
the check !plat_device_is_coherent() in mips_dma_sync_sg_for_device()
can be moved outside the for loop.
As a side effect, this also avoids a GCC bug that caused kernel compile
to fail with the error:
arch/mips/mm/dma-default.c: In function 'mips_dma_sync_sg_for_cpu':
arch/mips/mm/dma-default.c:316:1: internal compiler error: in add_insn_before, at emit-rtl.c:3852
This gcc failure is seen in Code Sourcery toolchains [e.g. gcc version
4.7.2 (Sourcery CodeBench Lite 2012.09-99)] after commit "MIPS: Optimize
current_cpu_type() for better code."
xen/spinlock: Document the xen_nopvspin parameter.
Which disables in the ticketlock slowpath the Xen PV optimization's.
Useful for diagnosing issues and comparing benchmarks in
over-commit CPU scenarios.
David Vrabel [Fri, 13 Sep 2013 14:13:30 +0000 (15:13 +0100)]
xen/p2m: check MFN is in range before using the m2p table
On hosts with more than 168 GB of memory, a 32-bit guest may attempt
to grant map an MFN that is error cannot lookup in its mapping of the
m2p table. There is an m2p lookup as part of m2p_add_override() and
m2p_remove_override(). The lookup falls off the end of the mapped
portion of the m2p and (because the mapping is at the highest virtual
address) wraps around and the lookup causes a fault on what appears to
be a user space address.
do_page_fault() (thinking it's a fault to a userspace address), tries
to lock mm->mmap_sem. If the gntdev device is used for the grant map,
m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
already locked. do_page_fault() then deadlocks.
The deadlock would most commonly occur when a 64-bit guest is started
and xenconsoled attempts to grant map its console ring.
Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
mapped portion of the m2p table before accessing the table and use
this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
(which already had the correct range check).
All faults caused by accessing the non-existant parts of the m2p are
thus within the kernel address space and exception_fixup() is called
without trying to lock mm->mmap_sem.
This means that for MFNs that are outside the mapped range of the m2p
then mfn_to_pfn() will always look in the m2p overrides. This is
correct because it must be a foreign MFN (and the PFN in the m2p in
this case is only relevant for the other domain).
Signed-off-by: David Vrabel <[email protected]> Cc: Stefano Stabellini <[email protected]> Cc: Jan Beulich <[email protected]>
--
v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
range as it's probably foreign. Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Acked-by: Stefano Stabellini <[email protected]>
powerpc/pseries: Do not start secondaries in Open Firmware
Starting secondary CPUs early on from Open Firmware and placing them
in a holding spin loop slows down the boot process significantly under
some hypervisors such as KVM.
This is also unnecessary when RTAS supports querying the CPU state
powerpc/zImage: make the "OF" wrapper support ePAPR boot
This makes the "OF" zImage wrapper (zImage.pseries, zImage.pmac,
zImage.maple) work if booted via a flat device-tree (ePAPR boot
mode), and thus potentially usable with kexec.
We've been keeping that field in thread_struct for a while, it contains
the "limit" of the current stack pointer and is meant to be used for
detecting stack overflows.
It has a few problems however:
- First, it was never actually *used* on 64-bit. Set and updated but
not actually exploited
- When switching stack to/from irq and softirq stacks, it's update
is racy unless we hard disable interrupts, which is costly. This
is fine on 32-bit as we don't soft-disable there but not on 64-bit.
Thus rather than fixing 2 in order to implement 1 in some hypothetical
future, let's remove the code completely from 64-bit. In order to avoid
a clutter of ifdef's, we remove the updates from C code completely
during interrupt stack switching, and instead maintain it from the
asm helper that is used to do the stack switching in the first place.
Paul E. McKenney [Wed, 25 Sep 2013 01:29:11 +0000 (18:29 -0700)]
mm: Place preemption point in do_mlockall() loop
There is a loop in do_mlockall() that lacks a preemption point, which
means that the following can happen on non-preemptible builds of the
kernel. Dave Jones reports:
"My fuzz tester keeps hitting this. Every instance shows the non-irq
stack came in from mlockall. I'm only seeing this on one box, but
that has more ram (8gb) than my other machines, which might explain
it.
INFO: rcu_preempt self-detected stall on CPU { 3} (t=6500 jiffies g=470344 c=470343 q=0)
sending NMI to all CPUs:
NMI backtrace for cpu 3
CPU: 3 PID: 29664 Comm: trinity-child2 Not tainted 3.11.0-rc1+ #32
Call Trace:
lru_add_drain_all+0x15/0x20
SyS_mlockall+0xa5/0x1a0
tracesys+0xdd/0xe2"
This commit addresses this problem by inserting the required preemption
point.
Rob Herring [Wed, 4 Sep 2013 15:52:57 +0000 (10:52 -0500)]
of: clean-up ifdefs in of_irq.h
Much of of_irq.h is needlessly ifdef'ed. Clean this up and minimize the
amount ifdef'ed code. This fixes some build warnings when CONFIG_OF
is not enabled (seen on i386 and x86_64):
include/linux/of_irq.h:82:7: warning: 'struct device_node' declared inside parameter list [enabled by default]
include/linux/of_irq.h:82:7: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
include/linux/of_irq.h:87:47: warning: 'struct device_node' declared inside parameter list [enabled by default]
Rob Herring [Fri, 6 Sep 2013 21:49:18 +0000 (16:49 -0500)]
openrisc: clean-up prom.h
Clean-up some copy/paste declarations that are not necessary. All the
functions either don't exist or are already declared in other headers.
This is needed in preparation of of_irq.h clean-up.
cpufreq: check cpufreq driver is valid and cpufreq isn't disabled in cpufreq_get()
cpufreq_get() can be called from external drivers which might not be aware if
cpufreq driver is registered or not. And so we should actually check if cpufreq
driver is registered or not and also if cpufreq is active or disabled, at the
beginning of cpufreq_get().
Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy).
Yinghai Lu [Fri, 20 Sep 2013 17:43:56 +0000 (10:43 -0700)]
acpi-cpufreq: skip loading acpi_cpufreq after intel_pstate
If the hw supports intel_pstate and acpi_cpufreq, intel_pstate will
get loaded first.
acpi_cpufreq_init() will call acpi_cpufreq_early_init()
and that will allocate perf data and init those perf data in ACPI core,
(that will cover all CPUs). But later it will free them as
cpufreq_register_driver(acpi_cpufreq) will fail as intel_pstate is
already registered
Use cpufreq_get_current_driver() to check if we can skip the
acpi_cpufreq loading.
We were getting occasional "Scheduling while atomic" call traces
during boot on some systems. Problem was first seen on a Cisco C210
but we were able to reproduce it on a Cisco c220m3. Setting
CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around
tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device.
I clearly need to be more aware of Andrew's racing schedule.
* akpm:
MAINTAINERS: update mach-bcm related email address
checkpatch: make extern in .h prototypes quieter
cciss: fix info leak in cciss_ioctl32_passthru()
cpqarray: fix info leak in ida_locked_ioctl()
kernel/reboot.c: re-enable the function of variable reboot_default
audit: fix endless wait in audit_log_start()
revert "memcg, vmscan: integrate soft reclaim tighter with zone shrinking code"
revert "memcg: get rid of soft-limit tree infrastructure"
revert "vmscan, memcg: do softlimit reclaim also for targeted reclaim"
revert "memcg: enhance memcg iterator to support predicates"
revert "memcg: track children in soft limit excess to improve soft limit"
revert "memcg, vmscan: do not attempt soft limit reclaim if it would not scan anything"
revert "memcg: track all children over limit in the root"
revert "memcg, vmscan: do not fall into reclaim-all pass too quickly"
fs/ocfs2/super.c: use a bigger nodestr in ocfs2_dismount_volume
watchdog: update watchdog_thresh properly
watchdog: update watchdog attributes atomically
Dan Carpenter [Tue, 24 Sep 2013 22:27:45 +0000 (15:27 -0700)]
cciss: fix info leak in cciss_ioctl32_passthru()
The arg64 struct has a hole after ->buf_size which isn't cleared. Or if
any of the calls to copy_from_user() fail then that would cause an
information leak as well.
Chuansheng Liu [Tue, 24 Sep 2013 22:27:43 +0000 (15:27 -0700)]
kernel/reboot.c: re-enable the function of variable reboot_default
Commit 1b3a5d02ee07 ("reboot: move arch/x86 reboot= handling to generic
kernel") did some cleanup for reboot= command line, but it made the
reboot_default inoperative.
The default value of variable reboot_default should be 1, and if command
line reboot= is not set, system will use the default reboot mode.
Michal Hocko [Tue, 24 Sep 2013 22:27:30 +0000 (15:27 -0700)]
watchdog: update watchdog_thresh properly
watchdog_tresh controls how often nmi perf event counter checks per-cpu
hrtimer_interrupts counter and blows up if the counter hasn't changed
since the last check. The counter is updated by per-cpu
watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
period which guarantees that hrtimer is scheduled 2 times per the main
period. Both hrtimer and perf event are started together when the
watchdog is enabled.
So far so good. But...
But what happens when watchdog_thresh is updated from sysctl handler?
proc_dowatchdog will set a new sampling period and hrtimer callback
(watchdog_timer_fn) will use the new value in the next round. The
problem, however, is that nobody tells the perf event that the sampling
period has changed so it is ticking with the period configured when it
has been set up.
This might result in an ear ripping dissonance between perf and hrtimer
parts if the watchdog_thresh is increased. And even worse it might lead
to KABOOM if the watchdog is configured to panic on such a spurious
lockup.
This patch fixes the issue by updating both nmi perf even counter and
hrtimers if the threshold value has changed.
The nmi one is disabled and then reinitialized from scratch. This has
an unpleasant side effect that the allocation of the new event might
fail theoretically so the hard lockup detector would be disabled for
such cpus. On the other hand such a memory allocation failure is very
unlikely because the original event is deallocated right before.
It would be much nicer if we just changed perf event period but there
doesn't seem to be any API to do that right now. It is also unfortunate
that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
cannot use on_each_cpu() and do the same thing from the per-cpu context.
The update from the current CPU should be safe because
perf_event_disable removes the event atomically before it clears the
per-cpu watchdog_ev so it cannot change anything under running handler
feet.
The hrtimer is simply restarted (thanks to Don Zickus who has pointed
this out) if it is queued because we cannot rely it will fire&adopt to
the new sampling period before a new nmi event triggers (when the
treshold is decreased).
Michal Hocko [Tue, 24 Sep 2013 22:27:29 +0000 (15:27 -0700)]
watchdog: update watchdog attributes atomically
proc_dowatchdog doesn't synchronize multiple callers which might lead to
confusion when two parallel callers might confuse watchdog_enable_all_cpus
resp watchdog_disable_all_cpus (eg watchdog gets enabled even if
watchdog_thresh was set to 0 already).
This patch adds a local mutex which synchronizes callers to the sysctl
handler.
Merge branch 'bcache' (bcache fixes from Kent Overstreet)
Merge bcache fixes from Kent Overstreet:
"There's fixes for _three_ different data corruption bugs, all of which
were found by users hitting them in the wild.
The first one isn't bcache specific - in 3.11 bcache was switched to
the bio_copy_data in fs/bio.c, and that's when the bug in that code
was discovered, but it's also used by raid1 and pktcdvd. (That was my
code too, so the bug's doubly embarassing given that it was or
should've been just a cut and paste from bcache code. Dunno what
happened there).
Most of these (all the non data corruption bugs, actually) were ready
before the merge window and have been sitting in Jens' tree, but I
don't know what's been up with him lately..."
* emailed patches from Kent Overstreet <[email protected]>:
bcache: Fix flushes in writeback mode
bcache: Fix for handling overlapping extents when reading in a btree node
bcache: Fix a shrinker deadlock
bcache: Fix a dumb CPU spinning bug in writeback
bcache: Fix a flush/fua performance bug
bcache: Fix a writeback performance regression
bcache: Correct printf()-style format length modifier
bcache: Fix for when no journal entries are found
bcache: Strip endline when writing the label through sysfs
bcache: Fix a dumb journal discard bug
block: Fix bio_copy_data()
Kent Overstreet [Tue, 24 Sep 2013 06:17:36 +0000 (23:17 -0700)]
bcache: Fix flushes in writeback mode
In writeback mode, when we get a cache flush we need to make sure we
issue a flush to the backing device.
The code for sending down an extra flush was wrong - by cloning the bio
we were probably getting flags that didn't make sense for a bare flush,
and also the old code was firing for FUA bios, for which we don't need
to send a flush to the backing device.
This was causing data corruption somehow - the mechanism was never
determined, but this patch fixes it for the users that were seeing it.
Kent Overstreet [Tue, 24 Sep 2013 06:17:35 +0000 (23:17 -0700)]
bcache: Fix for handling overlapping extents when reading in a btree node
btree_sort_fixup() was overly clever, because it was trying to avoid
pulling a key off the btree iterator in more than one place.
This led to a really obscure bug where we'd break early from the loop in
btree_sort_fixup() if the current key overlapped with keys in more than
one older set, and the next key it overlapped with was zero size.
Kent Overstreet [Tue, 24 Sep 2013 06:17:31 +0000 (23:17 -0700)]
bcache: Fix a writeback performance regression
Background writeback works by scanning the btree for dirty data and
adding those keys into a fixed size buffer, then for each dirty key in
the keybuf writing it to the backing device.
When read_dirty() finishes and it's time to scan for more dirty data, we
need to wait for the outstanding writeback IO to finish - they still
take up slots in the keybuf (so that foreground writes can check for
them to avoid races) - without that wait, we'll continually rescan when
we'll be able to add at most a key or two to the keybuf, and that takes
locks that starves foreground IO. Doh.
bcache: Correct printf()-style format length modifier
Fix
drivers/md/bcache/btree.c: In function ‘bch_btree_node_read’:
drivers/md/bcache/btree.c:259: warning: format ‘%lu’ expects type ‘long unsigned int’, but argument 3 has type ‘size_t’