Stephen Warren [Sat, 29 Dec 2012 00:42:46 +0000 (17:42 -0700)]
ARM: dts: prevent *.dtb from always being rebuilt
if_changed (used by the *.dts->*.dtc rule) rebuilds files if they aren't
contained in $(targets). (make V=2 indicates this). Add $(dtb-y) to
$(targets) to prevent *.dtb from always being rebuilt.
This fixes a regression introduced by the .dtb rule rework in 499cd82
"ARM: dt: change .dtb build rules to build in dts directory".
Stephen Warren [Sat, 29 Dec 2012 00:42:47 +0000 (17:42 -0700)]
arm64: dts: prevent *.dtb from always being rebuilt
if_changed (used by the *.dts->*.dtc rule) rebuilds files if they aren't
contained in $(targets). (make V=2 indicates this). Add $(dtb-y) to
$(targets) to prevent *.dtb from always being rebuilt. Note
This fixes a regression introduced by the .dtb rule rework in da4cbc6
"arm64: use new common dtc rule", although since arm64 doesn't actually
have any *.dts yet, this isn't a critical issue.
Taking another look at the C400 descriptors, I see now that there is
a clock selector (0x80) for this device.
Right now, the clock source points to the internal clock (0x81), which
is also valid. When the external clock source (0x82) is selected in the
mixer, and the rates mismatch (if it's free-running it is fixed to
48KHz), xruns will occur.
Set the clock ID to the clock selector unit (0x81), which then
allows the validation code to function correctly.
ALSA: usb - fix race in creation of M-Audio Fast track pro driver
A patch in the 3.2 kernel caused regression with hotplugging the
M-Audio Fast track pro, or sound after suspend. I don't have the
device so I haven't done a full analysis, but it seems userspace
(both udev and pulseaudio) got confused when a card was created,
immediately destroyed, and then created again.
However, at least one person in the bug report (martin djfun)
reports that this patch resolves the issue for him. It also leaves
a message in the log:
"snd-usb-audio: probe of 1-1.1:1.1 failed with error -5" which is
a bit misleading. It is better than non-working audio, but maybe
there's a more elegant solution?
Nitin Gupta [Wed, 2 Jan 2013 16:53:41 +0000 (08:53 -0800)]
staging: zram: fix invalid memory references during disk write
Fixes a bug introduced by commit c8f2f0db1 ("zram: Fix handling
of incompressible pages") which caused invalid memory references
during disk write. Invalid references could occur in two cases:
- Incoming data expands on compression: In this case, reference was
made to kunmap()'ed bio page.
- Partial (non PAGE_SIZE) write with incompressible data: In this
case, reference was made to a kfree()'ed buffer.
Namjae Jeon [Sat, 12 Jan 2013 05:41:13 +0000 (14:41 +0900)]
f2fs: remove redundant call to set_blocksize in f2fs_fill_super
Since, f2fs supports only 4KB blocksize, which is set at the beginning in
f2fs_fill_super. So, we do not need to again check this blocksize setting
in such case.
Bjorn Helgaas [Fri, 11 Jan 2013 22:30:52 +0000 (15:30 -0700)]
PCI: shpchp: Make shpchp_wq non-ordered
e24dcbef93 ("shpchp: update workqueue usage") was described as adding
non-ordered shpchp_wq, but it actually made it an *ordered* workqueue.
This patch changes shpchp_wq to be non-ordered, as described in the e24dcbef93 commit log and as was done for pciehp by a827ea307b ("pciehp:
update workqueue usage").
The function aer_recover_queue() calls pci_get_domain_bus_and_slot(), which
requires that the caller decrement the reference count with pci_dev_put().
This patch adds the missing call to pci_dev_put().
Hans de Goede [Fri, 11 Jan 2013 11:08:57 +0000 (12:08 +0100)]
udldrmfb: udl_get_edid: usb_control_msg buffer must not be on the stack
The buffer passed to usb_control_msg may end up in scatter-gather list, and
may thus not be on the stack. Having it on the stack usually works on x86, but
not on other archs.
Hans de Goede [Fri, 11 Jan 2013 11:08:56 +0000 (12:08 +0100)]
udldrmfb: Fix EDID not working with monitors with EDID extension blocks
udldrmfb only reads the main EDID block, and if that advertises extensions
the drm_edid code expects them to be present, and starts reading beyond the
buffer udldrmfb passes it.
Although it may be possible to read more EDID info with the udl we simpy don't
know how, and even if trial and error gets it working on one device, that is
no guarantee it will work on other revisions. So this patch does a simple fix
in the form of patching the EDID info to report 0 extension blocks, this
fixes udldrmfb only doing 1024x768 on monitors with EDID extension blocks.
Dave Airlie [Sun, 13 Jan 2013 22:16:54 +0000 (08:16 +1000)]
Merge branch 'drm-fixes-3.8' of git://people.freedesktop.org/~agd5f/linux into drm-next
Fixes for UMS mode which has been broken for a while plus an rn50 fix
and a dma fix.
* 'drm-fixes-3.8' of git://people.freedesktop.org/~agd5f/linux:
radeon/kms: fix dma relocation checking
radeon/kms: force rn50 chip to always report connected on analog output
drm/radeon: fix error path in kpage allocation
drm/radeon: fix a bogus kfree
drm/radeon: fix NULL pointer dereference in UMS mode
Dave Airlie [Sun, 13 Jan 2013 22:15:36 +0000 (08:15 +1000)]
Merge branch 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next
Regression fixes since rework mostly.
* 'drm-nouveau-fixes' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
drm/nvc0/fb: fix crash when different mutex is used to protect same list
drm/nouveau/clock: fix support for more than 2 monitors on nve0
drm/nv50/disp: fix selection of bios script for analog outputs
drm/nv17-50: restore fence buffer on resume
drm/nouveau: fix blank LVDS screen regression on pre-nv50 cards
drm/nouveau: fix nouveau_client allocation failure path
drm/nouveau: don't return freed object from nouveau_handle_create
drm/nouveau/vm: fix memory corruption when pgt allocation fails
drm/nouveau: add locking around instobj list operations
drm/nouveau: do not forcibly power on lvds panels
drm/nouveau/devinit: ensure legacy vga control is enabled during post
usb: ftdi_sio: Crucible Technologies COMET Caller ID - pid added
Simple fix to add support for Crucible Technologies COMET Caller ID
USB decoder - a device containing FTDI USB/Serial converter chip,
handling 1200bps CallerID messages decoded from the phone line -
adding correct USB PID is sufficient.
Tested to apply cleanly and work flawlessly against 3.6.9, 3.7.0-rc8
and 3.8.0-rc3 on both amd64 and x86 arches.
Aleksi Torhamo [Wed, 9 Jan 2013 18:08:48 +0000 (20:08 +0200)]
drm/nvc0/fb: fix crash when different mutex is used to protect same list
Fixes regression introduced in commit 861d2107
"drm/nouveau/fb: merge fb/vram and port to subdev interfaces"
nv50_fb_vram_{new,del} functions were changed to use
nouveau_subdev->mutex instead of the old nouveau_mm->mutex.
nvc0_fb_vram_new still uses the nouveau_mm->mutex, but nvc0 doesn't
have its own fb_vram_del function, using nv50_fb_vram_del instead.
Because of this, on nvc0 a different mutex ends up being used to protect
additions and deletions to the same list.
Aleksi Torhamo [Fri, 4 Jan 2013 16:39:13 +0000 (18:39 +0200)]
drm/nouveau/clock: fix support for more than 2 monitors on nve0
Fixes regression introduced in commit 70790f4f
"drm/nouveau/clock: pull in the implementation from all over the place"
When code was moved from nv50_crtc_set_clock to nvc0_clock_pll_set,
the PLLs it is used for got limited to only the first two VPLLs.
nv50_crtc_set_clock was only called to change VPLLs, so it didn't
limit what it was used for in any way. Since nvc0_clock_pll_set is
used for all PLLs, it has to specify which PLLs the code is used for,
and only listed the first two VPLLs.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=58735
This patch is a -stable candidate for 3.7.
Marcin Slusarz [Thu, 3 Jan 2013 18:38:45 +0000 (19:38 +0100)]
drm/nv50/disp: fix selection of bios script for analog outputs
Analog output number was overwritten by value from digital output path.
Fix it.
Fixes resume from s2ram: https://bugs.freedesktop.org/show_bug.cgi?id=58729
(as stumbled on by J Binder, Pontus Fuchs and me)
Fixes blank screen on module load (reported by Sune Mølgaard).
Marcin Slusarz [Tue, 25 Dec 2012 17:13:22 +0000 (18:13 +0100)]
drm/nv17-50: restore fence buffer on resume
Since commit 5e120f6e4b3f35b741c5445dfc755f50128c3c44 "drm/nouveau/fence:
convert to exec engine, and improve channel sync" nouveau fence sync
implementation for nv17-50 and nvc0+ started to rely on state of fence buffer
left by previous sync operation. But as pinned bo's (where fence state is
stored) are not saved+restored across suspend/resume, we need to do it
manually.
Marcin Slusarz [Tue, 18 Dec 2012 19:30:47 +0000 (20:30 +0100)]
drm/nouveau: fix blank LVDS screen regression on pre-nv50 cards
Commit 2a44e499 ("drm/nouveau/disp: introduce proper init/fini, separate
from create/destroy") started to call display init routines on pre-nv50
hardware on module load. But LVDS init code sets driver state in a way
which prevents modesetting code from operating properly.
nv04_display_init calls nv04_dfp_restore, which sets encoder->last_dpms to
NV_DPMS_CLEARED.
nv04_lvds_dpms checks last_dpms mode (which is NV_DPMS_CLEARED) and wrongly
assumes it's a "powersaving mode", the new one (DRM_MODE_DPMS_OFF) is too,
so it skips calling some crucial lvds scripts.
Ben Skeggs [Mon, 10 Dec 2012 08:53:43 +0000 (18:53 +1000)]
drm/nouveau: do not forcibly power on lvds panels
This fix was put in place to fix a bug where the eDP panel on certain
laptops fails to respond over the aux channel after suspend.
It appears that on some systems (Dell M6600, with LVDS panel) there's a
very bad interaction with the eDP init table that causes the SOR to get
very confused and not drive the panel correctly, leading to bleed.
A DPMS off/on cycle is enough to bring it back, but, this will avoid the
problem by not touching the panel GPIOs at times we're not meant to.
Sathya Perla [Fri, 11 Jan 2013 22:47:02 +0000 (22:47 +0000)]
be2net: fix unconditionally returning IRQ_HANDLED in INTx
commit e49cc34f introduced an unconditional IRQ_HANDLED return in be_intx()
to workaround Lancer and BE2 HW issues. This is bad as it prevents the kernel
from detecting interrupt storms due to broken HW.
The BE2/Lancer HW issues are:
1) In Lancer, there is no means for the driver to detect if the interrupt
belonged to device, other than counting and notifying events.
2) In Lancer de-asserting INTx takes a while, causing the INTx irq handler
to be called multiple times till the de-assert happens.
3) In BE2, we see an occasional interrupt even when EQs are unarmed.
Issue (1) can cause the notified events to be orphaned, if NAPI was already
running.
This patch fixes this issue by scheduling NAPI only if it is not scheduled
already. Doing this also takes care of possible events_get() race that may be
caused due to issue (2) and (3). Also, IRQ_HANDLED is returned only the first
time zero events are detected.
(Thanks Ben H. for the feedback and suggestions.)
Yijing Wang [Fri, 11 Jan 2013 02:15:54 +0000 (10:15 +0800)]
PCI: pciehp: Use per-slot workqueues to avoid deadlock
When we have a hotplug-capable PCIe port with a second hotplug-capable
PCIe port below it, removing the device below the upstream port causes
a deadlock.
The deadlock happens because we use the pciehp_wq workqueue to run
pciehp_power_thread(), which uses pciehp_disable_slot() to remove devices
below the upstream port. When we remove the downstream PCIe port, we call
pciehp_remove(), the pciehp driver's .remove() method. That calls
flush_workqueue(pciehp_wq), which deadlocks because the
pciehp_power_thread() work item is still running.
This patch avoids the deadlock by creating a workqueue for every PCIe port
and removing the single shared workqueue.
The xfs module uses a lot of tracepoint, with TRACEPOINTS=y and a
few debugging options the GOT table of the xfs module will get
bigger than 4K. To get a working xfs module it needs to be compiled
with -fPIC instead of -fpic. To play safe use -fPIC for all modules.
Gerald Schaefer [Wed, 9 Jan 2013 17:49:51 +0000 (18:49 +0100)]
s390/mm: fix pmd_pfn() for thp
The pfn calculation in pmd_pfn() is broken for thp, because it uses
HPAGE_SHIFT instead of the normal PAGE_SHIFT. This is fixed by removing
the distinction between thp and normal pmds in that function, and always
using PAGE_SHIFT.
Yinghai Lu [Sat, 12 Jan 2013 13:00:06 +0000 (14:00 +0100)]
ACPI / glue: Fix build with ACPI_GLUE_DEBUG set
If ACPI_GLUE_DEBUG is different from 0 (setting this requires a
manual change of glue.c), build breaks because of a leftover
reference to dev->acpi_handle in acpi_platform_notify(). Fix this
by using ACPI_HANDLE(dev) instead as appropriate.
Jason Wang [Fri, 11 Jan 2013 16:59:34 +0000 (16:59 +0000)]
tuntap: fix leaking reference count
Reference count leaking of both module and sock were found:
- When a detached file were closed, its sock refcnt from device were not
released, solving this by add the sock_put().
- The module were hold or drop unconditionally in TUNSETPERSIST, which means we
if we set the persist flag for N times, we need unset it for another N
times. Solving this by only hold or drop an reference when there's a flag
change and also drop the reference count when the persist device is deleted.
Jason Wang [Fri, 11 Jan 2013 16:59:33 +0000 (16:59 +0000)]
tuntap: forbid calling TUNSETIFF when detached
Michael points out that even after Stefan's fix the TUNSETIFF is still allowed
to create a new tap device. This because we only check tfile->tun but the
tfile->detached were introduced. Fix this by failing early in tun_set_iff() if
the file is detached. After this fix, there's no need to do the check again in
tun_set_iff(), so this patch removes it.
Rusty Russell [Sat, 12 Jan 2013 02:57:34 +0000 (13:27 +1030)]
module: put modules in list much earlier.
Prarit's excellent bug report:
> In recent Fedora releases (F17 & F18) some users have reported seeing
> messages similar to
>
> [ 15.478160] kvm: Could not allocate 304 bytes percpu data
> [ 15.478174] PERCPU: allocation failed, size=304 align=32, alloc from
> reserved chunk failed
>
> during system boot. In some cases, users have also reported seeing this
> message along with a failed load of other modules.
>
> What is happening is systemd is loading an instance of the kvm module for
> each cpu found (see commit e9bda3b). When the module load occurs the kernel
> currently allocates the modules percpu data area prior to checking to see
> if the module is already loaded or is in the process of being loaded. If
> the module is already loaded, or finishes load, the module loading code
> releases the current instance's module's percpu data.
Now we have a new state MODULE_STATE_UNFORMED, we can insert the
module into the list (and thus guarantee its uniqueness) before we
allocate the per-cpu region.
Felipe Balbi [Thu, 10 Jan 2013 12:32:39 +0000 (14:32 +0200)]
usb: host: ohci-tmio: fix compile warning
Fix the following compile warning:
In file included from drivers/usb/host/ohci-hcd.c:1170:0:
drivers/usb/host/ohci-tmio.c: In function 'tmio_start_hc':
drivers/usb/host/ohci-tmio.c:130:2: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 4 has type 'resource_size_t' [-Wformat]
fsl-ehci probing fails on mpc5121e:
...
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
fsl-ehci fsl-ehci.0: Freescale On-Chip EHCI Host Controller
fsl-ehci fsl-ehci.0: new USB bus registered, assigned bus number 1
fsl-ehci fsl-ehci.0: Could not get controller version
fsl-ehci fsl-ehci.0: can't setup
fsl-ehci fsl-ehci.0: USB bus 1 deregistered
fsl-ehci fsl-ehci.0: init fsl-ehci.0 fail, -22
fsl-ehci: probe of fsl-ehci.0 failed with error -22
Fix it by returning appropriate version info for mpc5121, too.
Maxime Ripard [Fri, 11 Jan 2013 17:06:21 +0000 (18:06 +0100)]
USB: select USB_ARCH_HAS_EHCI for MXS
Commit 09f6ffde (USB: EHCI: fix build error by making ChipIdea host a
normal EHCI driver) introduced a dependency on USB_EHCI_HCD for the
chipidea USB host driver, that in turns depends on USB_ARCH_HAS_EHCI.
If this symbol is not set for MXS, the MXS boards are not able to use
the chipidea driver anymore.
wireless core does not correctly assign ethtool_ops.
After alloc_netdev*() call, some cfg80211 drivers provide they own
ethtool_ops, but some do not. For them, wireless core provide generic
cfg80211_ethtool_ops, which is assigned in NETDEV_REGISTER notify call:
if (!dev->ethtool_ops)
dev->ethtool_ops = &cfg80211_ethtool_ops;
But after Eric's commit, dev->ethtool_ops is no longer NULL (on cfg80211
drivers without custom ethtool_ops), but points to &default_ethtool_ops.
In order to fix the problem, provide function which will overwrite
default_ethtool_ops and use it by wireless core.
Amerigo Wang [Thu, 10 Jan 2013 22:52:54 +0000 (22:52 +0000)]
qlge: remove NETIF_F_TSO6 flag
It is werid that qlge driver supports NETIF_F_TSO6 but
not NETIF_F_IPV6_CSUM. This also causes some kernel warning [1]
when VLAN device setups on a qlge interface.
I think the qlge hardware doesn't support NETIF_F_IPV6_CSUM,
so we have to just remove the NETIF_F_TSO6 flag.
After this patch, the TCP/IPv6 traffic becomes normal again,
no kernel warnings any more.
NOTE: I only tested it on 2.6.32 kernel, even if the upstream
kernel could fix this automatically (it is hard to track NETIF*
flags), removing it is also safe.
Linus Torvalds [Fri, 11 Jan 2013 22:56:41 +0000 (14:56 -0800)]
Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull a hwmon patch from Guenter Roeck:
"Fix build error in vexpress driver"
* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (vexpress) Fix build error seen if CONFIG_OF_DEVICE is not set
Linus Torvalds [Fri, 11 Jan 2013 22:55:15 +0000 (14:55 -0800)]
Merge branch 'akpm' (incoming fixes from Andrew)
Merge misc fixes from Andrew Morton:
"The audit fixes have been floating around for a while - Al and Eric
aren't responding to either myself or Kees so I asked Kees to
re-review them and here they are."
* emailed patches from Andrew Morton <[email protected]>: (22 commits)
lib/rbtree.c: avoid the use of non-static __always_inline
MAINTAINERS: Omar had moved
mm: compaction: partially revert capture of suitable high-order page
linux/audit.h: move ptrace.h include to kernel header
kernel/audit.c: avoid negative sleep durations
audit: catch possible NULL audit buffers
audit: create explicit AUDIT_SECCOMP event type
MAINTAINERS: fix a status pattern
MAINTAINERS: fix arch/arm/plat-omap/include/plat/omap_hwmod.h
mm: thp: acquire the anon_vma rwsem for write during split
mm: mmap: annotate vm_lock_anon_vma locking properly for lockdep
lockdep, rwsem: provide down_write_nest_lock()
arch/mn10300/Kconfig: select CONFIG_GENERIC_ATOMIC64
mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment
mm: use aligned zone start for pfn_to_bitidx calculation
fs/exec.c: work around icc miscompilation
mm: compaction: fix echo 1 > compact_memory return error issue
mm: memblock: fix wrong memmove size in memblock_merge_regions()
drivers/video/ssd1307fb.c: fix bit order bug in the byte translation function
mm: migrate: check page_count of THP before migrating
...
lib/rbtree.c: avoid the use of non-static __always_inline
lib/rbtree.c declared __rb_erase_color() as __always_inline void, and
then exported it with EXPORT_SYMBOL.
This was because __rb_erase_color() must be exported for augmented
rbtree users, but it must also be inlined into rb_erase() so that the
dummy callback can get optimized out of that call site.
(Actually with a modern compiler, none of the dummy callback functions
should even be generated as separate text functions).
The above usage is legal C, but it was unusual enough for some compilers
to warn about it. This change makes things more explicit, with a static
__always_inline ____rb_erase_color function for use in rb_erase(), and a
separate non-inline __rb_erase_color function for use in
rb_erase_augmented call sites.
Mel Gorman [Fri, 11 Jan 2013 22:32:16 +0000 (14:32 -0800)]
mm: compaction: partially revert capture of suitable high-order page
Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
waiting for POLLIN on a local TCP socket. It was easier to trigger if
there was disk IO and dirty pages at the same time and he bisected it to
commit 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page
immediately when it is made available").
The intention of that patch was to improve high-order allocations under
memory pressure after changes made to reclaim in 3.6 drastically hurt
THP allocations but the approach was flawed. For Eric, the problem was
that page->pfmemalloc was not being cleared for captured pages leading
to a poor interaction with swap-over-NFS support causing the packets to
be dropped. However, I identified a few more problems with the patch
including the fact that it can increase contention on zone->lock in some
cases which could result in async direct compaction being aborted early.
In retrospect the capture patch took the wrong approach. What it should
have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
was allocating for THP and avoided races that way. While the patch was
showing to improve allocation success rates at the time, the benefit is
marginal given the relative complexity and it should be revisited from
scratch in the context of the other reclaim-related changes that have
taken place since the patch was first written and tested. This patch
partially reverts commit 1fb3f8ca0e92 ("mm: compaction: capture a
suitable high-order page immediately when it is made available").
Mike Frysinger [Fri, 11 Jan 2013 22:32:13 +0000 (14:32 -0800)]
linux/audit.h: move ptrace.h include to kernel header
While the kernel internals want pt_regs (and so it includes
linux/ptrace.h), the user version of audit.h does not need it. So move
the include out of the uapi version.
This avoids issues where people want the audit defines and userland
ptrace api. Including both the kernel ptrace and the userland ptrace
headers can easily lead to failure.
Andrew Morton [Fri, 11 Jan 2013 22:32:11 +0000 (14:32 -0800)]
kernel/audit.c: avoid negative sleep durations
audit_log_start() performs the same jiffies comparison in two places.
If sufficient time has elapsed between the two comparisons, the second
one produces a negative sleep duration:
schedule_timeout: wrong timeout value fffffffffffffff0
Pid: 6606, comm: trinity-child1 Not tainted 3.8.0-rc1+ #43
Call Trace:
schedule_timeout+0x305/0x340
audit_log_start+0x311/0x470
audit_log_exit+0x4b/0xfb0
__audit_syscall_exit+0x25f/0x2c0
sysret_audit+0x17/0x21
Fix it by performing the comparison a single time.
Kees Cook [Fri, 11 Jan 2013 22:32:05 +0000 (14:32 -0800)]
audit: create explicit AUDIT_SECCOMP event type
The seccomp path was using AUDIT_ANOM_ABEND from when seccomp mode 1
could only kill a process. While we still want to make sure an audit
record is forced on a kill, this should use a separate record type since
seccomp mode 2 introduces other behaviors.
In the case of "handled" behaviors (process wasn't killed), only emit a
record if the process is under inspection. This change also fixes
userspace examination of seccomp audit events, since it was considered
malformed due to missing fields of the AUDIT_ANOM_ABEND event type.
Alexander Beregalov and Alex Xu reported similar bugs and Hillf Danton
identified that commit 5a505085f043 ("mm/rmap: Convert the struct
anon_vma::mutex to an rwsem") and commit 4fc3f1d66b1e ("mm/rmap,
migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable")
were likely the problem. Reverting these commits was reported to solve
the problem for Alexander.
Despite the reason for these commits, NUMA balancing is not the direct
source of the problem. split_huge_page() expects the anon_vma lock to
be exclusive to serialise the whole split operation. Ordinarily it is
expected that the anon_vma lock would only be required when updating the
avcs but THP also uses the anon_vma rwsem for collapse and split
operations where the page lock or compound lock cannot be used (as the
page is changing from base to THP or vice versa) and the page table
locks are insufficient.
This patch takes the anon_vma lock for write to serialise against parallel
split_huge_page as THP expected before the conversion to rwsem.
which is incomplete, and causes the false positive report from lockdep
below.
Annotate the fact that mmap_sem is used as an outter lock to serialize
taking of all the anon_vma rwsems at once no matter the order, using the
down_write_nest_lock() primitive.
This patch fixes this lockdep report:
=============================================
[ INFO: possible recursive locking detected ] 3.8.0-rc2-00036-g5f73896 #171 Not tainted
---------------------------------------------
qemu-kvm/2315 is trying to acquire lock:
(&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0
but task is already holding lock:
(&anon_vma->rwsem){+.+...}, at: mm_take_all_locks+0x149/0x1b0
other info that might help us debug this:
Possible unsafe locking scenario:
Jiri Kosina [Fri, 11 Jan 2013 22:31:56 +0000 (14:31 -0800)]
lockdep, rwsem: provide down_write_nest_lock()
down_write_nest_lock() provides a means to annotate locking scenario
where an outer lock is guaranteed to serialize the order nested locks
are being acquired.
This is analogoue to already existing mutex_lock_nest_lock() and
spin_lock_nest_lock().
Max Filippov [Fri, 11 Jan 2013 22:31:52 +0000 (14:31 -0800)]
mm: bootmem: fix free_all_bootmem_core() with odd bitmap alignment
Currently free_all_bootmem_core ignores that node_min_pfn may be not
multiple of BITS_PER_LONG. Eg commit 6dccdcbe2c3e ("mm: bootmem: fix
checking the bitmap when finally freeing bootmem") shifts vec by lower
bits of start instead of lower bits of idx. Also
if (IS_ALIGNED(start, BITS_PER_LONG) && vec == ~0UL)
assumes that vec bit 0 corresponds to start pfn, which is only true when
node_min_pfn is a multiple of BITS_PER_LONG. Also loop in the else
clause can double-free pages (e.g. with node_min_pfn == start == 1,
map[0] == ~0 on 32-bit machine page 32 will be double-freed).
This bug causes the following message during xtensa kernel boot:
bootmem::free_all_bootmem_core nid=0 start=1 end=8000
BUG: Bad page state in process swapper pfn:00001
page:d04bd020 count:0 mapcount:-127 mapping: (null) index:0x2
page flags: 0x0()
Call Trace:
bad_page+0x8c/0x9c
free_pages_prepare+0x5e/0x88
free_hot_cold_page+0xc/0xa0
__free_pages+0x24/0x38
__free_pages_bootmem+0x54/0x56
free_all_bootmem_core$part$11+0xeb/0x138
free_all_bootmem+0x46/0x58
mem_init+0x25/0xa4
start_kernel+0x11e/0x25c
should_never_return+0x0/0x3be7
The fix is the following:
- always align vec so that its bit 0 corresponds to start
- provide BITS_PER_LONG bits in vec, if those bits are available in the
map
- don't free pages past next start position in the else clause.
Laura Abbott [Fri, 11 Jan 2013 22:31:51 +0000 (14:31 -0800)]
mm: use aligned zone start for pfn_to_bitidx calculation
The current calculation in pfn_to_bitidx assumes that (pfn -
zone->zone_start_pfn) >> pageblock_order will return the same bit for
all pfn in a pageblock. If zone_start_pfn is not aligned to
pageblock_nr_pages, this may not always be correct.
Consider the following with pageblock order = 10, zone start 2MB:
This means that calling {get,set}_pageblock_migratetype on a single page
will not set the migratetype for the full block. Fix this by rounding
down zone_start_pfn when doing the bitidx calculation.
For our use case, the effects of this bug were mostly tied to the fact
that CMA allocations would either take a long time or fail to happen.
Depending on the driver using CMA, this could result in anything from
visual glitches to application failures.
Xi Wang [Fri, 11 Jan 2013 22:31:48 +0000 (14:31 -0800)]
fs/exec.c: work around icc miscompilation
The tricky problem is this check:
if (i++ >= max)
icc (mis)optimizes this check as:
if (++i > max)
The check now becomes a no-op since max is MAX_ARG_STRINGS (0x7FFFFFFF).
This is "allowed" by the C standard, assuming i++ never overflows,
because signed integer overflow is undefined behavior. This
optimization effectively reverts the previous commit 362e6663ef23
("exec.c, compat.c: fix count(), compat_count() bounds checking") that
tries to fix the check.
Lin Feng [Fri, 11 Jan 2013 22:31:44 +0000 (14:31 -0800)]
mm: memblock: fix wrong memmove size in memblock_merge_regions()
The memmove span covers from (next+1) to the end of the array, and the
index of next is (i+1), so the index of (next+1) is (i+2). So the size
of remaining array elements is (type->cnt - (i + 2)).
Since the remaining elements of the memblock array are move forward by
one element and there is only one additional element caused by this bug.
So there won't be any write overflow here but read overflow. It may
read one more element out of the array address if the array happens to
be full. Commonly it doesn't matter at all but if the array happens to
be located at the end a memblock, it may cause a invalid read operation
for the physical address doesn't exist.
There are 2 *happens to be* here, so I think the probability is quite
low, I don't know if any guy is haunted by this bug before.
Maxime Ripard [Fri, 11 Jan 2013 22:31:43 +0000 (14:31 -0800)]
drivers/video/ssd1307fb.c: fix bit order bug in the byte translation function
This was leading to a strange behaviour when using the fbcon driver on
top of this one: the letters were in the right order, but each letter
had a vertical symmetry.
This was because the addressing was right for the byte, but the
addressing of each individual bit was inverted.
Mel Gorman [Fri, 11 Jan 2013 22:31:40 +0000 (14:31 -0800)]
mm: migrate: check page_count of THP before migrating
Hugh Dickins pointed out that migrate_misplaced_transhuge_page() does
not check page_count before migrating like base page migration and
khugepage. He could not see why this was safe and he is right.
The potential impact of the bug is avoided due to the limitations of
NUMA balancing. The page_mapcount() check ensures that only a single
address space is using this page and as THPs are typically private it
should not be possible for another address space to fault it in
parallel. If the address space has one associated task then it's
difficult to have both a GUP pin and be referencing the page at the same
time. If there are multiple tasks then a buggy scenario requires that
another thread be accessing the page while the direct IO is in flight.
This is dodgy behaviour as there is a possibility of corruption with or
without THP migration. It would be
While we happen to be safe for the most part it is shoddy to depend on
such "safety" so this patch checks the page count similar to anonymous
pages. Note that this does not mean that the page_mapcount() check can
go away. If we were to remove the page_mapcount() check the the THP
would have to be unmapped from all referencing PTEs, replaced with
migration PTEs and restored properly afterwards.
WARNING: drivers/rtc/rtc-da9055.o(.text+0xa71): Section mismatch in reference from the function da9055_rtc_probe() to the function .init.text:da9055_rtc_device_init()
David Decotigny [Fri, 11 Jan 2013 22:31:36 +0000 (14:31 -0800)]
lib: cpu_rmap: avoid flushing all workqueues
In some cases, free_irq_cpu_rmap() is called while holding a lock (eg
rtnl). This can lead to deadlocks, because it invokes
flush_scheduled_work() which ends up waiting for whole system workqueue
to flush, but some pending works might try to acquire the lock we are
already holding.
This commit uses reference-counting to replace
irq_run_affinity_notifiers(). It also removes
irq_run_affinity_notifiers() altogether.
Jesse Barnes [Wed, 14 Nov 2012 20:43:31 +0000 (20:43 +0000)]
x86/Sandy Bridge: reserve pages when integrated graphics is present
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table. So reserve those at boot if set detect a SNB gfx device on the
CPU to avoid GPU hangs.
Stephane Marchesin had a similar patch to the page allocator awhile
back, but rather than reserving pages up front, it leaked them at
allocation time.
[ hpa: made a number of stylistic changes, marked arrays as static
const, and made less verbose; use "memblock=debug" for full
verbosity. ]
Krzysztof Mazur [Fri, 11 Jan 2013 22:20:09 +0000 (23:20 +0100)]
cpuidle: fix number of initialized/destroyed states
Commit bf4d1b5ddb78f86078ac6ae0415802d5f0c68f92 (cpuidle: support
multiple drivers) changed the number of initialized state kobjects
in cpuidle_add_state_sysfs() from device->state_count to
drv->state_count, but left device->state_count in
cpuidle_remove_state_sysfs(). The values of these two fields may be
different, in which case a NULL pointer dereference may happen in
cpuidle_remove_state_sysfs(), for example. Fix this problem by making
cpuidle_add_state_sysfs() use device->state_count too (which restores
the original behavior of it).
Steven Rostedt [Fri, 11 Jan 2013 21:14:10 +0000 (16:14 -0500)]
tracing: Fix regression with irqsoff tracer and tracing_on file
Commit 02404baf1b47 "tracing: Remove deprecated tracing_enabled file"
removed the tracing_enabled file as it never worked properly and
the tracing_on file should be used instead. But the tracing_on file
didn't call into the tracers start/stop routines like the
tracing_enabled file did. This caused trace-cmd to break when it
enabled the irqsoff tracer.
If you just did "echo irqsoff > current_tracer" then it would work
properly. But the tool trace-cmd disables tracing first by writing
"0" into the tracing_on file. Then it writes "irqsoff" into
current_tracer and then writes "1" into tracing_on. Unfortunately,
the above commit changed the irqsoff tracer to check the tracing_on
status instead of the tracing_enabled status. If it's disabled then
it does not start the tracer internals.
The problem is that writing "1" into tracing_on does not call the
tracers "start" routine like writing "1" into tracing_enabled did.
This makes the irqsoff tracer not start when using the trace-cmd
tool, and is a regression for userspace.
Simple fix is to have the tracing_on file call the tracers start()
method when being enabled (and the stop() method when disabled).
Oliver Neukum [Thu, 29 Nov 2012 14:05:57 +0000 (15:05 +0100)]
USB: hub: handle claim of enabled remote wakeup after reset
Some touchscreens have buggy firmware which claims
remote wakeup to be enabled after a reset. They nevertheless
crash if the feature is cleared by the host.
Add a check for reset resume before checking for
an enabled remote wakeup feature. On compliant
devices the feature must be cleared after a reset anyway.
Linus Torvalds [Fri, 11 Jan 2013 20:09:04 +0000 (12:09 -0800)]
Merge tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfix from Trond Myklebust:
- Fix a socket lock leak in net/sunrpc/xprt.c
* tag 'nfs-for-3.8-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: Ensure we release the socket write lock if the rpc_task exits early
Fabio Estevam [Mon, 10 Dec 2012 02:58:36 +0000 (00:58 -0200)]
usb: imx21-hcd: Include missing linux/module.h
Include <linux/module.h>, so that the following errors are fixed:
drivers/usb/host/imx21-hcd.c:1929:20: error: expected declaration specifiers or '...' before string constant
drivers/usb/host/imx21-hcd.c:1930:15: error: expected declaration specifiers or '...' before string constant
drivers/usb/host/imx21-hcd.c:1931:16: error: expected declaration specifiers or '...' before string constant
drivers/usb/host/imx21-hcd.c:1932:14: error: expected declaration specifiers or '...' before string constant
Tamas Lengyel [Mon, 31 Dec 2012 20:44:30 +0000 (15:44 -0500)]
xen/privcmd: Relax access control in privcmd_ioctl_mmap
In the privcmd Linux driver two checks in the functions
privcmd_ioctl_mmap and privcmd_ioctl_mmap_batch are not needed as they
are trying to enforce hypervisor-level access control. They should be
removed as they break secondary control domains when performing dom0
disaggregation. Xen itself provides adequate security controls around
these hypercalls and these checks prevent those controls from
functioning as intended.
Linus Torvalds [Fri, 11 Jan 2013 17:05:28 +0000 (09:05 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull intel DRM fixes from Dave Airlie:
"Just intel fixes, including getting the Ironlake systems back to the
state they were in for 3.6."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Revert shrinker changes from "Track unbound pages"
drm/i915: Use pixel size for computing linear offsets into a sprite
drm/i915: Add DEBUG messages to all intel_create_user_framebuffer error paths
drm/i915: The sprite scaler on Ironlake also support YUV planes
drm: Only evict the blocks required to create the requested hole
drm/i915: Treat crtc->mode.clock == 0 as disabled
Revert "drm/i915: no lvds quirk for Zotac ZDBOX SD ID12/ID13"
drm/i915; Only increment the user-pin-count after successfully pinning the bo
Mel Gorman [Fri, 11 Jan 2013 09:27:01 +0000 (09:27 +0000)]
mm: compaction: Partially revert capture of suitable high-order page
Eric Wong reported on 3.7 and 3.8-rc2 that ppoll() got stuck when
waiting for POLLIN on a local TCP socket. It was easier to trigger if
there was disk IO and dirty pages at the same time and he bisected it to
commit 1fb3f8ca0e92 ("mm: compaction: capture a suitable high-order page
immediately when it is made available").
The intention of that patch was to improve high-order allocations under
memory pressure after changes made to reclaim in 3.6 drastically hurt
THP allocations but the approach was flawed. For Eric, the problem was
that page->pfmemalloc was not being cleared for captured pages leading
to a poor interaction with swap-over-NFS support causing the packets to
be dropped. However, I identified a few more problems with the patch
including the fact that it can increase contention on zone->lock in some
cases which could result in async direct compaction being aborted early.
In retrospect the capture patch took the wrong approach. What it should
have done is mark the pageblock being migrated as MIGRATE_ISOLATE if it
was allocating for THP and avoided races that way. While the patch was
showing to improve allocation success rates at the time, the benefit is
marginal given the relative complexity and it should be revisited from
scratch in the context of the other reclaim-related changes that have
taken place since the patch was first written and tested. This patch
partially reverts commit 1fb3f8ca "mm: compaction: capture a suitable
high-order page immediately when it is made available".
Laurent Pinchart [Fri, 11 Jan 2013 12:42:00 +0000 (09:42 -0300)]
[media] uvcvideo: Set error_idx properly for S_EXT_CTRLS failures
The uvc_set_ctrl() calls don't write to the hardware. A failure at that
point thus leaves the device in a clean state, with no control modified.
Set the error_idx field to the count value to reflect that, as per the
V4L2 specification.
TRY_EXT_CTRLS is unchanged and the error_idx field must always be set to
the failed control index in that case.
Laurent Pinchart [Fri, 11 Jan 2013 12:32:04 +0000 (09:32 -0300)]
[media] uvcvideo: Cleanup leftovers of partial revert
Commit ba68c8530a263dc4de440fa10bb20a1c5b9d4ff5 (Partly revert "[media]
uvcvideo: Set error_idx properly for extended controls API failures")
missed two modifications. Clean them up.
Dave Reisner [Wed, 2 Jan 2013 13:54:37 +0000 (08:54 -0500)]
debugfs: convert gid= argument from decimal, not octal
This patch technically breaks userspace, but I suspect that anyone who
actually used this flag would have encountered this brokenness, declared
it lunacy, and already sent a patch.
Thomas Schwinge [Fri, 16 Nov 2012 09:46:20 +0000 (10:46 +0100)]
sh: Fix FDPIC binary loader
Ensure that the aux table is properly initialized, even when optional features
are missing. Without this, the FDPIC loader did not work. This was meant to
be included in commit d5ab780305bb6d60a7b5a74f18cf84eb6ad153b1.
sh: clkfwk: bugfix: sh_clk_div_enable() care sh_clk_div_set_rate() if div6
764f4e4e33d18cde4dcaf8a0d860b749c6d6d08b
(sh: clkfwk: Use shared sh_clk_div_enable/disable())
shared enable/disable funcions for div4/div6.
But new sh_clk_div_enable() didn't care sh_clk_div_set_rate()
which is required on div6 clock.
This patch fixes it.
sh: define TASK_UNMAPPED_BASE as a page aligned constant
b4265f12340f809447b9a48055e88c444b480c89
(mm: use vm_unmapped_area() on sh architecture)
broke sh boot. This patch define TASK_UNMAPPED_BASE
as a page aligned constant to solve this issue.
Special thanks to Michel