Stuart Menefy [Tue, 14 Feb 2012 11:29:11 +0000 (11:29 +0000)]
sh: Fix error synchronising kernel page tables
The problem is caused by the interaction of two features in the Linux
memory management code.
A processes address space is described by a struct mm_struct, and
every thread has a pointer to the mm it should run in. The exception
to this are kernel threads, which don't have an mm, and so borrow
the mm from the last thread which ran. The system is bootstrapped
by the initial kernel thread using init's mm (even though init hasn't
been created yet, its mm is the static init_mm).
The other feature is how the kernel handles the page table which
describes the portion of the address space which is only visible when
executing inside the kernel, and which is shared by all threads. On
the SH4 the only portion of the kernel's address space which described
using the page table is called P3, from 0xc0000000 to 0xdfffffff. This
portion of the address space is divided into three:
- mappings for dma_alloc_coherent()
- mappings for vmalloc() and ioremap()
- fixmap mappings, primarily used in copy_user_pages() to create
kernel mappings of user pages with the correct cache colour.
To optimise the TLB miss handler we don't want to add an additional
condition which checks whether the faulting address is in the user or
the kernel portion of the address space, and so all page tables have a
common portion which describes the kernel part of the address
space. As the SH4 uses a two level page table, only the kernel portion
of first level page table (the pgd entries) is duplicated. These all
point to the same second level entries (the pte's), and so no memory
is wasted.
The reference page table for the kernel is called the swapper_pg_dir,
and when a new page table is created for a new process the kernel
portion of the page table is copied from swapper_pg_dir. This works
fine when changes only occur in the second level of the kernel's page
table, or the first level entries are created before any new user
processes. However if a change occurs to the first level of the page
table, and there are existing processes which don't have this entry in
their page table, this new entry needs to be added. This is done on
demand, when the kernel accesses a P3 address which isn't mapped using
the current page table, the code in vmalloc_fault() copies the entry
from the reference page table (swapper_pg_dir) into the current
processes page table.
The bug which this patch addresses is that the code in vmalloc_fault()
was not copying addresses which fell in the dma_alloc_coherent()
portion of the address space, and it should have been copying any P3
address.
Why we hadn't seen this before, and what made this hard to reproduce,
is that normally the kernel will have called dma_alloc_coherent(), and
accessed the memory mapping created, before any user process
runs. Typically drivers such as USB or SATA will have created and used
mappings of this type during the kernel initialisation, when probing
for the attached devices, before init runs. Ethernet is slightly
different, as it normally only creates and accesses
dma_alloc_coherent() mappings when the network is brought up, but if
kernel level IP configuration is used this will also occur before any
user space process runs. So the first reproduction of this problem
which we saw was occurred when USB and SATA were removed from the
kernel, and then bring up Ethernet from user space using ifconfig.
I'd like to thank Joseph Bormolini who did the hard work reducing the
problem to this simple to reproduce criteria.
In your case the situation is slightly different, and turns out to
depends on the exact kernel configuration (which we had) and your
ramdisk contents (which we didn't - hence the need for some assumptions).
In this case the problem is a side effect of kernel level module
loading. Kernel subsystems sometimes trigger the load of kernel
modules directly, for example the crypto subsystem tries to load the
cryptomgr and MTD tries to load modules for Flash partitioning if
these are not built into the kernel. This is done by the kernel
creating a user process which runs insmod to try and load the
appropriate module.
In order for this to cause problems the system must be running with a
initrd or initramfs, which contains an insmod executable - if the
kernel can't find an insmod to run, no user process is created, and
the problem doesn't occur. If an insmod is found, a process is
created to run it, which will inherit the kernel portion of the
swapper_pg_dir first level page table. It doesn't matter whether the
inmod is successful or not, but when the the kernel scheduler context
switches back to the kernel initialisation thread, the insmod's mm is
'borrowed' by the kernel thread, as it doesn't have an address space
of its own. (Reference counting is used to ensure this mm is not
destroyed, even though the user process which caused its creation may no
longer exist.) If this address space doesn't have a first level page
table entry for the consistent mappings, and a driver tries to access
such a mapping, we are in the same situation as described above,
except this time in a kernel thread rather than a user thread
executing inside the kernel.
memcg: fix Bad page state after replace_page_cache
My 9ce70c0240d0 "memcg: fix deadlock by inverting lrucare nesting" put a
nasty little bug into v3.3's version of mem_cgroup_replace_page_cache(),
sometimes used for FUSE. Replacing __mem_cgroup_commit_charge_lrucare()
by __mem_cgroup_commit_charge(), I used the "pc" pointer set up earlier:
but it's for oldpage, and needs now to be for newpage. Once oldpage was
freed, its PageCgroupUsed bit (cleared above but set again here) caused
"Bad page state" messages - and perhaps worse, being missed from newpage.
(I didn't find this by using FUSE, but in reusing the function for tmpfs.)
Olof Johansson [Thu, 19 Apr 2012 04:30:16 +0000 (21:30 -0700)]
Merge tag 'at91-fixes' of git://github.com/at91linux/linux-at91 into fixes
Here is another fixes series for AT91 designed for 3.4-rc.
We experienced some issues while compiling some drivers as modules: Joachim has
corrected several of them. We may reduce this number of exported values by
reworking some drivers, in the future.
Some drivers are also modified here, I would like to keep them in the series
as the modifications are really related with our recent move to irqdomains or
simply related with compiler annotations.
I keep dmaengine Kconfig modification in this "fixes" series. The DMA
driver will not be available for 9x5 SoC family otherwise.
* tag 'at91-fixes' of git://github.com/at91linux/linux-at91:
dmaengine: Kconfig: fix Atmel at_hdmac entry
USB: gadget/at91_udc: add gpio_to_irq() function to vbus interrupt
USB: ohci-at91: change annotations for probe/remove functions
leds-atmel-pwm.c: Make pwmled_probe() __devinit
ARM: at91: fix at91sam9261ek Ethernet dm9000 irq
ARM: at91: fix rm9200ek flash size
ARM: at91: remove empty at91_init_serial function
ARM: at91: fix typo in at91_pmc_base assembly declaration
ARM: at91: Export at91_matrix_base
ARM: at91: Export at91_pmc_base
ARM: at91: Export at91_ramc_base
ARM: at91: Export at91_st_base
Olof Johansson [Thu, 19 Apr 2012 04:28:16 +0000 (21:28 -0700)]
Merge branch 'fixes-for-arm-soc-20120416' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson into fixes
* 'fixes-for-arm-soc-20120416' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
ARM: ux500: update defconfig
ARM: ux500: Fix unmet direct dependency
ARM: ux500: wake secondary cpu via resched
Olof Johansson [Thu, 19 Apr 2012 04:25:52 +0000 (21:25 -0700)]
Merge tag 'v3.4-rc3-imx-fixes' of git://git.pengutronix.de/git/imx/linux-2.6 into fixes
ARM i.MX misc fixes for -rc
* tag 'v3.4-rc3-imx-fixes' of git://git.pengutronix.de/git/imx/linux-2.6:
ARM: imx: Fix imx5 idle logic bug
ARM: imx27-dt: Fix build due to removal of irq_domain_add_simple()
ARM: imx_v4_v5_defconfig: Add support for CONFIG_REGULATOR_FIXED_VOLTAGE
Paul Gortmaker [Mon, 16 Apr 2012 19:38:28 +0000 (15:38 -0400)]
ARM: bcmring: fix UART declarations
This error appeared in the bcmring_defconfig build:
CC arch/arm/mach-bcmring/core.o
arch/arm/mach-bcmring/core.c:55: error: macro "AMBA_APB_DEVICE" requires 6 arguments, but only 5 given
arch/arm/mach-bcmring/core.c:55: warning: type defaults to 'int' in declaration of 'AMBA_APB_DEVICE'
arch/arm/mach-bcmring/core.c:56: error: macro "AMBA_APB_DEVICE" requires 6 arguments, but only 5 given
arch/arm/mach-bcmring/core.c:56: warning: type defaults to 'int' in declaration of 'AMBA_APB_DEVICE'
arch/arm/mach-bcmring/core.c:134: error: 'uartA_device' undeclared here (not in a function)
arch/arm/mach-bcmring/core.c:135: error: 'uartB_device' undeclared here (not in a function)
make[2]: *** [arch/arm/mach-bcmring/core.o] Error 1
Alex Williamson [Wed, 18 Apr 2012 03:46:44 +0000 (21:46 -0600)]
KVM: lock slots_lock around device assignment
As pointed out by Jason Baron, when assigning a device to a guest
we first set the iommu domain pointer, which enables mapping
and unmapping of memory slots to the iommu. This leaves a window
where this path is enabled, but we haven't synchronized the iommu
mappings to the existing memory slots. Thus a slot being removed
at that point could send us down unexpected code paths removing
non-existent pinnings and iommu mappings. Take the slots_lock
around creating the iommu domain and initial mappings as well as
around iommu teardown to avoid this race.
Add missing "personality.h"
security/commoncap.c: In function 'cap_bprm_set_creds':
security/commoncap.c:510: error: 'PER_CLEAR_ON_SETID' undeclared (first use in this function)
security/commoncap.c:510: error: (Each undeclared identifier is reported only once
security/commoncap.c:510: error: for each function it appears in.)
Avi Kivity [Wed, 18 Apr 2012 12:03:04 +0000 (15:03 +0300)]
KVM: VMX: Fix kvm_set_shared_msr() called in preemptible context
kvm_set_shared_msr() may not be called in preemptible context,
but vmx_set_msr() does so:
BUG: using smp_processor_id() in preemptible [00000000] code: qemu-kvm/22713
caller is kvm_set_shared_msr+0x32/0xa0 [kvm]
Pid: 22713, comm: qemu-kvm Not tainted 3.4.0-rc3+ #39
Call Trace:
[<ffffffff8131fa82>] debug_smp_processor_id+0xe2/0x100
[<ffffffffa0328ae2>] kvm_set_shared_msr+0x32/0xa0 [kvm]
[<ffffffffa03a103b>] vmx_set_msr+0x28b/0x2d0 [kvm_intel]
...
Making kvm_set_shared_msr() work in preemptible is cleaner, but
it's used in the fast path. Making two variants is overkill, so
this patch just disables preemption around the call.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: use flexible array in fuse.h
fuse: allow nanosecond granularity
fuse: O_DIRECT support for files
fuse: fix nlink after unlink
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
"A couple of bug fixes, one of them is a TLB flush fix. Included as
well is one small coding style patch and a patch to update the default
configuration."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
[S390] Fix compile error in swab.h
[S390] Fix stfle() lowcore protection problem
[S390] cpum_cf: get rid of compile warnings
[S390] irq: simple coding style change
[S390] update default configuration
[S390] fix tlb flushing for page table pages
[S390] kernel: Use local_irq_save() for memcpy_real()
[S390] s390/char/vmur.c: fix memory leak
[S390] drivers/s390/block/dasd_eckd.c: add missing dasd_sfree_request
This driver anticipates pch_uart_verify_port() is not called
during installation.
However, actually pch_uart_verify_port() is called during
installation.
As a result, memory access violation occurs like below.
0. initial value: use_dma=0
1. starup()
- dma channel is not allocated because use_dma=0
2. pch_uart_verify_port()
- Set use_dma=1
3. UART processing acts DMA mode because use_dma=1
- memory access violation occurs!
This patch fixes the issue.
Solution:
Whenever pch_uart_verify_port() is called and then
dma channel is not allocated, the channel should be allocated.
Alexander Shiyan [Tue, 27 Mar 2012 08:22:49 +0000 (12:22 +0400)]
ARM: clps711x: serial driver hungs are a result of call disable_irq within ISR
Since 2.6.30-rc1 clps711x serial driver hungs system. This is a result
of call disable_irq from ISR. synchronize_irq waits for end of interrupt
and goes to infinite loop. This patch fix this problem.
Paul Gortmaker [Mon, 16 Apr 2012 04:52:51 +0000 (00:52 -0400)]
powerpc: fix system.h fallout in sysdev/scom.c [chroma_defconfig]
The following shows up in chroma_defconfig:
CC arch/powerpc/sysdev/scom.o
arch/powerpc/sysdev/scom.c: In function 'scom_debug_init':
arch/powerpc/sysdev/scom.c:182:36: error: 'powerpc_debugfs_root' undeclared (first use in this function)
arch/powerpc/sysdev/scom.c:182:36: note: each undeclared identifier is reported only once for each function it appears in
make[2]: *** [arch/powerpc/sysdev/scom.o] Error 1
make[1]: *** [arch/powerpc/sysdev/scom.o] Error 2
This call is not needed; the IRQ controller should (and does) set up
interrupts correctly. set_irq_flags() isn't exported to modules, to
this also fixes compilation of ehci-tegra.c as a module.
Tomoki Sekiyama [Thu, 29 Mar 2012 23:51:36 +0000 (08:51 +0900)]
USB: yurex: Fix missing URB_NO_TRANSFER_DMA_MAP flag in urb
Current probing code is setting URB_NO_TRANSFER_DMA_MAP flag into a wrong urb
structure, and this causes BUG_ON with some USB host implementations.
This patch fixes the issue.
Tomoki Sekiyama [Thu, 29 Mar 2012 23:51:28 +0000 (08:51 +0900)]
USB: yurex: Remove allocation of coherent buffer for setup-packet buffer
Removes allocation of coherent buffer for the control-request setup-packet
buffer from the yurex driver. Using coherent buffers for setup-packet is
obsolete and does not work with some USB host implementations.
Eric Dumazet [Wed, 18 Apr 2012 10:14:23 +0000 (10:14 +0000)]
tcp: fix retransmit of partially acked frames
Alexander Beregalov reported skb_over_panic errors and provided stack
trace.
I occurs commit a21d45726aca (tcp: avoid order-1 allocations on wifi and
tx path) added a regression, when a retransmit is done after a partial
ACK.
tcp_retransmit_skb() tries to aggregate several frames if the first one
has enough available room to hold the following ones payload. This is
controlled by /proc/sys/net/ipv4/tcp_retrans_collapse tunable (default :
enabled)
Problem is we must make sure _pskb_trim_head() doesnt fool
skb_availroom() when pulling some bytes from skb (this pull is done when
receiver ACK part of the frame).
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem fixes from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
fcaps: clear the same personality flags as suid when fcaps are used
mpi: Avoid using freed pointer in mpi_lshift_limbs()
Smack: move label list initialization
Lasse Collin [Wed, 18 Apr 2012 16:55:44 +0000 (19:55 +0300)]
xz: Enable BCJ filters on SPARC and 32-bit x86
The BCJ filters were meant to be enabled already on these
archs, but the xz_wrap.sh script was buggy. Enabling the
filters should give smaller kernel images.
xz_wrap.sh will now use $SRCARCH instead of $ARCH to detect
the architecture. That way it doesn't need to care about the
subarchs (like i386 vs. x86_64) since the BCJ filters don't
care either.
Alan Stern [Wed, 18 Apr 2012 15:33:00 +0000 (11:33 -0400)]
EHCI: always clear the STS_FLR status bit
This patch (as1544) fixes a problem affecting some EHCI controllers.
They can generate interrupts whenever the STS_FLR status bit is turned
on, even though that bit is masked out in the Interrupt Enable
register.
Since the driver doesn't use STS_FLR anyway, the patch changes the
interrupt routine to clear that bit whenever it is set, rather than
leaving it alone.
Merge tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev
Pull libara fixes from Jeff Garzik:
- Notable regression fix. Forbid dynamic runtime power management by
default, due to issues with suspend/resume and hotplug.
To re-enable, use sysfs.
- make ata_print_id atomic, due to ref from multiple contexts
- sata_mv warning fix
- ata_piix new PCI ID
* tag 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata: forbid port runtime pm by default, fixing regression
libata: make ata_print_id atomic
sata_mv: silence an uninitialized variable warning
ata_piix: IDE-mode SATA patch for Intel DH89xxCC DeviceIDs
drivers/block/xen-blkback/xenbus.c: In function 'xen_blkbk_discard':
drivers/block/xen-blkback/xenbus.c:419:4: warning: passing argument 1 of 'dev_warn' makes pointer from integer without a cast
+[enabled by default]
include/linux/device.h:894:5: note: expected 'const struct device *' but argument is of type 'long int'
It is unclear how that mistake made it in. It surely is wrong.
* commit 'c104f1fa1ecf4ee0fc06e31b1f77630b2551be81': (14566 commits)
cpufreq: OMAP: fix build errors: depends on ARCH_OMAP2PLUS
sparc64: Eliminate obsolete __handle_softirq() function
sparc64: Fix bootup crash on sun4v.
kconfig: delete last traces of __enabled_ from autoconf.h
Revert "kconfig: fix __enabled_ macros definition for invisible and un-selected symbols"
kconfig: fix IS_ENABLED to not require all options to be defined
irq_domain: fix type mismatch in debugfs output format
staging: android: fix mem leaks in __persistent_ram_init()
staging: vt6656: Don't leak memory in drivers/staging/vt6656/ioctl.c::private_ioctl()
staging: iio: hmc5843: Fix crash in probe function.
panic: fix stack dump print on direct call to panic()
drivers/rtc/rtc-pl031.c: enable clock on all ST variants
Revert "mm: vmscan: fix misused nr_reclaimed in shrink_mem_cgroup_zone()"
hugetlb: fix race condition in hugetlb_fault()
drivers/rtc/rtc-twl.c: use static register while reading time
drivers/rtc/rtc-s3c.c: add placeholder for driver private data
drivers/rtc/rtc-s3c.c: fix compilation error
MAINTAINERS: add PCDP console maintainer
memcg: do not open code accesses to res_counter members
drivers/rtc/rtc-efi.c: fix section mismatch warning
...
David Miller [Thu, 12 Apr 2012 18:37:30 +0000 (14:37 -0400)]
Fix modpost failures in fedora 17
The symbol table on x86-64 starts to have entries that have names
like:
_GLOBAL__sub_I_65535_0___mod_x86cpu_device_table
They are of type STT_FUNCTION and this one had a length of 18. This
matched the device ID validation logic and it barfed because the
length did not meet the device type's criteria.
--------------------
FATAL: arch/x86/crypto/aesni-intel: sizeof(struct x86cpu_device_id)=16 is not a modulo of the size of section __mod_x86cpu_device_table=18.
Fix definition of struct x86cpu_device_id in mod_devicetable.h
--------------------
These are some kind of compiler tool internal stuff being emitted and
not something we want to inspect in modpost's device ID table
validation code.
So skip the symbol if it is not of type STT_OBJECT.
David S. Miller [Wed, 18 Apr 2012 19:26:52 +0000 (15:26 -0400)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
From John:
Another batch of fixes intended for 3.4...
First up, we have a minor signedness fix for libertas from Amitkumar
Karwar. Next, Arend gives us a brcm80211 fix for correctly enabling
Tx FIFOs on channels 12 and 13. Bing Zhao gives us some register
address corrections for mwifiex. Felix give us a trio of fixes --
one for ath9k to wake the hardware properly from full sleep, one for
mac80211 to properly handle packets in cooked monitor mode, and one
for ensuring that the proper HT mode selection is honored.
Hauke gives us a bcma fix for handling the lack of an sprom. Jonathon
Bither gives us an ath5k fix for a missing THIS_MODULE build issue,
and another ath5k fix for an io mapping leak. Lukasz Kucharczyk
fixes a bitwise check in cfg80211, and Sujith gives us an ath9k fix
for assigning sequence numbers for fragmented frames. Finally, we
have a MAINTAINERS change from Wey-Yi Guy -- congrats to Johannes
Berg for taking the lead on iwlwifi. :-)
Andre Przywara [Mon, 9 Apr 2012 22:16:34 +0000 (18:16 -0400)]
hwmon: fam15h_power: fix bogus values with current BIOSes
Newer BKDG[1] versions recommend a different initialization value for
the running average range register in the northbridge. This improves
the power reading by avoiding counter saturations resulting in bogus
values for anything below about 80% of TDP power consumption.
Updated BIOSes will have this new value set up from the beginning,
but meanwhile we correct this value ourselves.
This needs to be done on all northbridges, even on those where the
driver itself does not register at.
This fixes the driver on all current machines to provide proper
values for idle load.
Stefan Behrens [Fri, 30 Mar 2012 11:58:32 +0000 (13:58 +0200)]
Btrfs: fix that check_int_data mount option was ignored
The bitfield member mount_opt was too small by one bit to hold the mount
option that enabled to include data extents in the integrity checker.
Since the same issue happened when the BTRFS_MOUNT_PANIC_ON_FATAL_ERROR
option was added (git rebase silently merges so that the increase of the
size of the bitfield member is lost), the bit limit was removed entirely.
Stefan Behrens [Mon, 19 Mar 2012 15:17:22 +0000 (16:17 +0100)]
Btrfs: fix btrfs_ioctl_dev_info() crash on missing device
When a filesystem is mounted with the degraded option, it is
possible that some of the devices are not there.
btrfs_ioctl_dev_info() crashs in this case because the device
name is a NULL pointer. This ioctl was only used for scrub.
Arne Jansen [Wed, 18 Apr 2012 08:27:16 +0000 (10:27 +0200)]
btrfs: don't return EINTR
It is basically a good thing if we are interruptible when waiting for
free space, but the generality in which it is implemented currently
leads to system calls being interruptible that are not documented this
way. For example git can't handle interrupted unlink(), leading to
corrupt repos under space pressure.
Instead we raise the bar to only be interruptible by SIGKILL.
Thanks to David Sterba for suggesting this.
Josef Bacik [Mon, 16 Apr 2012 13:42:26 +0000 (09:42 -0400)]
Btrfs: always store the mirror we read the eb from
A user reported a panic where we were trying to fix a bad mirror but the
mirror number we were giving was 0, which is invalid. This is because we
don't do the transid verification until after the read, so as far as the
read code is concerned the read was a success. So instead store the mirror
we read from so that if there is some failure post read we know which mirror
to try next and which mirror needs to be fixed if we find a good copy of the
block. Thanks,
Btrfs: fix max chunk size check in chunk allocator
Fix a bug, where in case we need to adjust stripe_size so that the
length of the resulting chunk is less than or equal to max_chunk_size,
DUP chunks turn out to be only half as big as they could be.
Jan Schmidt [Fri, 13 Apr 2012 10:28:08 +0000 (12:28 +0200)]
Btrfs: add missing read locks in backref.c
iref_to_path and iterate_irefs both increment the eb's refcount to use it
after releasing the path. Both depend on consistent data remaining in the
extent buffer and need a read lock to protect it.
Jan Schmidt [Fri, 13 Apr 2012 10:28:00 +0000 (12:28 +0200)]
Btrfs: don't call free_extent_buffer twice in iterate_irefs
Avoid calling free_extent_buffer more than once when the iterator function
returns non-zero. The only code that uses this is scrub repair for corrupted
nodatasum blocks.
Btrfs: Make free_ipath() deal gracefully with NULL pointers
Make free_ipath() behave like most other freeing functions in the
kernel and gracefully do nothing when passed a NULL pointer.
Besides this making the bahaviour consistent with functions such as
kfree(), vfree(), btrfs_free_path() etc etc, it also fixes a real NULL
deref issue in fs/btrfs/ioctl.c::btrfs_ioctl_ino_to_path(). In that
function we have this code:
If we ever take the true branch of that 'if' statement we'll end up
passing a NULL pointer to free_ipath() which will subsequently
dereference it and we'll go "Boom" :-(
This patch will avoid that.
Li Zefan [Mon, 12 Mar 2012 08:39:48 +0000 (16:39 +0800)]
Btrfs: avoid possible use-after-free in clear_extent_bit()
clear_extent_bit()
{
next_node = rb_next(&state->rb_node);
...
clear_state_bit(state); <-- this may free next_node
if (next_node) {
state = rb_entry(next_node);
...
}
}
clear_state_bit() calls merge_state() which may free the next node
of the passing extent_state, so clear_extent_bit() may end up
referencing freed memory.
The conversion was done blindly and is wrong, as it does not provide a
primary handler to disable the level type irq on the device level.
Neither does it set the IRQF_ONESHOT flag which handles that at the irq
line level. This can't be done as the interrupt might be shared, though
we might extend the core to force it.
So an interrupt on this line will wake up the thread, but immediately
unmask the irq after that. Due to the interrupt being level type the
hardware interrupt is raised over and over and prevents the irq thread
from handling it. Fail.
request_irq() unfortunately does not refuse such a request and the patch
was obviously never tested with real interrupts.
Arne Jansen [Sat, 25 Feb 2012 08:09:47 +0000 (09:09 +0100)]
btrfs: don't add both copies of DUP to reada extent tree
Normally when there are 2 copies of a block, we add both to the
reada extent tree and prefetch only the one that is easier to reach.
This way we can better utilize multiple devices.
In case of DUP this makes no sense as both copies reside on the
same device.
Arne Jansen [Sat, 25 Feb 2012 08:09:30 +0000 (09:09 +0100)]
btrfs: fix race in reada
When inserting into the radix tree returns EEXIST, get the existing
entry without giving up the spinlock in between.
There was a race for both the zones trees and the extent tree.
Li Zefan [Tue, 21 Feb 2012 09:04:28 +0000 (17:04 +0800)]
Btrfs: avoid setting ->d_op twice
Follow those instructions, and you'll trigger a warning in the
beginning of d_set_d_op():
# mkfs.btrfs /dev/loop3
# mount /dev/loop3 /mnt
# btrfs sub create /mnt/sub
# btrfs sub snap /mnt /mnt/snap
# touch /mnt/snap/sub
touch: cannot touch `tmp': Permission denied
__d_alloc() set d_op to sb->s_d_op (btrfs_dentry_operations), and
then simple_lookup() reset it to simple_dentry_operations, which
triggered the warning.
x86, apic: APIC code touches invalid MSR on P5 class machines
Current APIC code assumes MSR_IA32_APICBASE is present for all systems.
Pentium Classic P5 and friends didn't have this MSR. MSR_IA32_APICBASE
was introduced as an architectural MSR by Intel @ P6.
Code paths that can touch this MSR invalidly are when vendor == Intel &&
cpu-family == 5 and APIC bit is set in CPUID - or when you simply pass
lapic on the kernel command line, on a P5.
The below patch stops Linux incorrectly interfering with the
MSR_IA32_APICBASE for P5 class machines. Other code paths exist that
touch the MSR - however those paths are not currently reachable for a
conformant P5.
SUNRPC: register PipeFS file system after pernet sybsystem
PipeFS superblock creation routine relays on SUNRPC pernet data presense, which
is created on register_pernet_subsys() call in SUNRPC module init function.
Registering of PipeFS filesystem prior to registering of per-net subsystem
leads to races (mount of PipeFS can dereference uninitialized data).
[media] drxk: Does not unlock mutex if sanity check failed in scu_command()
If sanity check fails in scu_command(), goto error leads to unlock of
an unheld mutex. The check should not fail in reality, but it nevertheless
worth fixing.
Found by Linux Driver Verification project (linuxtesting.org).
[media] dvb_frontend: Fix a regression when switching back to DVB-S
There are some softwares (Kaffeine and likely xine) that uses a
DVBv5 call to switch to DVB-S2, but expects that a DVBv3 call to
switch back to DVB-S. Well, this is not right, as a DVBv3 call
doesn't know anything about delivery systems.
However, as, by accident, this used to work, we need to restore its
behavior, in order to avoid regressions with those softwares.
Reported on this Fedora 16 bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=812895
Thomas Gleixner [Wed, 18 Apr 2012 10:08:23 +0000 (12:08 +0200)]
tick: Fix oneshot broadcast setup really
Sven Joachim reported, that suspend/resume on rc3 trips over a NULL
pointer dereference. Linus spotted the clockevent handler being NULL.
commit fa4da365b(clockevents: tTack broadcast device mode change in
tick_broadcast_switch_to_oneshot()) tried to fix a problem with the
broadcast device setup, which was introduced in commit 77b0d60c5(
clockevents: Leave the broadcast device in shutdown mode when not
needed).
The initial commit avoided to set up the broadcast device when no
broadcast request bits were set, but that left the broadcast device
disfunctional. In consequence deep idle states which need the
broadcast device were not woken up.
commit fa4da365b tried to fix that by initializing the state of the
broadcast facility, but that missed the fact, that nothing initializes
the event handler and some other state of the underlying clock event
device.
The fix is to revert both commits and make only the mode setting of
the clock event device conditional on the state of active broadcast
users.
That initializes everything except the low level device mode, but this
happens when the broadcast functionality is invoked by deep idle.
Chris Wilson [Tue, 17 Apr 2012 19:37:00 +0000 (20:37 +0100)]
drm/i915: Do not set "Enable Panel Fitter" on SNB pageflips
Not only do the pageflip work without it at non-native modes (i.e. with
the panel fitter enabled), it also causes normal (non-pageflipped)
modesets to fail.
Paul Mundt [Wed, 18 Apr 2012 02:13:04 +0000 (19:13 -0700)]
ASoC: fsi: update for dmaengine prep_slave_sg fallout.
Leading up to the ->device_prep_slave_sg change in 185ecb5f4fd43911c35956d4cc7d94a1da30417f 'dmaengine: add context
parameter to prep_slave_sg and prep_dma_cyclic' a generic wrapper was
added in place to guard against the API change, though the fsi driver
wasn't updated in the process (presumably its dmaengine support hadn't
been merged yet at the time). This trivially switches over to the new
wrapper and gets it building again.
Eric Paris [Tue, 17 Apr 2012 20:26:54 +0000 (16:26 -0400)]
fcaps: clear the same personality flags as suid when fcaps are used
If a process increases permissions using fcaps all of the dangerous
personality flags which are cleared for suid apps should also be cleared.
Thus programs given priviledge with fcaps will continue to have address space
randomization enabled even if the parent tried to disable it to make it
easier to attack.
Christian Riesch [Mon, 16 Apr 2012 04:35:25 +0000 (04:35 +0000)]
davinci_mdio: Fix MDIO timeout check
Under heavy load (flood ping) it is possible for the MDIO timeout to
expire before the loop checks the GO bit again. This patch adds an
additional check whether the operation was done before actually
returning -ETIMEDOUT.
To reproduce this bug, flood ping the device, e.g., ping -f -l 1000
After some time, a "timed out waiting for user access" warning
may appear. And even worse, link may go down since the PHY reported a
timeout.
Semantically, rt6_clean_expires() wants to do rt->dst.from = NULL instead of
rt->dst.expires = 0. It is clearing the RTF_EXPIRES flag, so the union is going
to be treated as a pointer (dst.from) not a long (dst.expires).
Commit 1716a961 (ipv6: fix problem with expired dst cache) broke PMTU
discovery. rt6_update_expires() calls dst_set_expires(), which only updates
dst->expires if it has not been set previously (expires == 0) or if the new
expires is earlier than the current dst->expires.
rt6_update_expires() needs to zero rt->dst.expires, otherwise it will contain
ivalid data left over from rt->dst.from and will confuse dst_set_expires().
arcrimi_probe() calls BUGMSG() before register_netdev() happens. BUGMSG()
itself prints dev->name, but as the format string hasn't been expanded by
register_netdev() yet, the output contains bogus device name such as
arc%d: Given: node 00h, shmem 0h, irq 0
As we don't know the device name yet, just drop the prefix completely from
the debugging messages.
Jesper Juhl [Mon, 6 Feb 2012 09:07:04 +0000 (20:07 +1100)]
mpi: Avoid using freed pointer in mpi_lshift_limbs()
At the start of the function we assign 'a->d' to 'ap'. Then we use the
RESIZE_IF_NEEDED macro on 'a' - this may free 'a->d' and replace it
with newly allocaetd storage. In that case, we'll be operating on
freed memory further down in the function when we index into 'ap[]'.
Since we don't actually need 'ap' until after the use of the
RESIZE_IF_NEEDED macro we can just delay the assignment to it until
after we've potentially resized, thus avoiding the issue.
While I was there anyway I also changed the integer variable 'n' to be
const. It might as well be since we only assign to it once and use it
as a constant, and then the compiler will tell us if we ever assign to
it in the future.
A kernel with Smack enabled will fail if tmpfs has xattr support.
Move the initialization of predefined Smack label
list entries to the LSM initialization from the
smackfs setup. This became an issue when tmpfs
acquired xattr support, but was never correct.
Alan Stern [Tue, 17 Apr 2012 19:24:15 +0000 (15:24 -0400)]
EHCI: fix criterion for resuming the root hub
This patch (as1542) changes the criterion ehci-hcd uses to tell when
it needs to resume the controller's root hub. A resume is needed when
a port status change is detected, obviously, but only if the root hub
is currently suspended.
Right now the driver tests whether the root hub is running, and that
is not the correct test. In particular, if the controller has died
then the root hub should not be restarted. In addition, some buggy
hardware occasionally requires the root hub to be running and
sending out SOF packets even while it is nominally supposed to be
suspended.
In the end, the test needs to be changed. Rather than checking whether
the root hub is currently running, the driver will now check whether
the root hub is currently suspended. This will yield the correct
behavior in all cases.
These devices have a number of non serial interfaces as well. Use
the existing "Direct IP" blacklist to prevent binding to interfaces
which are handled by other drivers.
We also extend the "Direct IP" blacklist with with interfaces only
seen in "QMI" mode, assuming that these devices use the same
interface numbers for serial interfaces both in "Direct IP" and in
"QMI" mode.
Alan Stern [Tue, 17 Apr 2012 19:22:39 +0000 (15:22 -0400)]
USB: fix deadlock in bConfigurationValue attribute method
This patch (as154) fixes a self-deadlock that occurs when userspace
writes to the bConfigurationValue sysfs attribute for a hub with
children. The task tries to lock the bandwidth_mutex at a time when
it already owns the lock:
The attribute's method calls usb_set_configuration(),
which calls usb_disable_device() with the bandwidth_mutex
held.
usb_disable_device() unregisters the existing interfaces,
which causes the hub driver to be unbound.
The hub_disconnect() routine calls hub_quiesce(), which
calls usb_disconnect() for each of the hub's children.
usb_disconnect() attempts to acquire the bandwidth_mutex
around a call to usb_disable_device().
The solution is to make usb_disable_device() acquire the mutex for
itself instead of requiring the caller to hold it. Then the mutex can
cover only the bandwidth deallocation operation and not the region
where the interfaces are unregistered.
This has the potential to change system behavior slightly when a
config change races with another config or altsetting change. Some of
the bandwidth released from the old config might get claimed by the
other config or altsetting, make it impossible to restore the old
config in case of a failure. But since we don't try to recover from
config-change failures anyway, this doesn't matter.
[This should be marked for stable kernels that contain the commit fccf4e86200b8f5edd9a65da26f150e32ba79808 "USB: Free bandwidth when
usb_disable_device is called."
That commit was marked for stable kernels as old as 2.6.32.]
This patch makes it so that we identify FCoE rings earlier than
ixgbe_set_rx_buffer_len. Instead we identify the Rx FCoE rings at
allocation time in ixgbe_alloc_q_vector.
The motivation behind this change is to avoid memory corruption when FCoE
is enabled. Without this change we were initializing the rings at 0, and
2K on systems with 4K pages, then when we bumped the buffer size to 4K with
order 1 pages we were accessing offsets 2K and 6K instead of 0 and 4K.
This was resulting in memory corruptions.
Upon resume from standby, ixgbe may trigger the ASSERT_RTNL() in
netif_set_real_num_tx_queues(). The call stack is:
netif_set_real_num_tx_queues
ixgbe_set_num_queues
ixgbe_init_interrupt_scheme
ixgbe_resume
DMTIMER source selection on OMAP1 is broken. omap1_dm_timer_set_src()
tries to use __raw_{read,write}l() to read from and write to physical
addresses, but those functions take virtual addresses.
sparse caught this:
arch/arm/mach-omap1/timer.c:50:13: warning: incorrect type in argument 1 (different base types)
arch/arm/mach-omap1/timer.c:50:13: expected void const volatile [noderef] <asn:2>*<noident>
arch/arm/mach-omap1/timer.c:50:13: got unsigned int
arch/arm/mach-omap1/timer.c:52:9: warning: incorrect type in argument 1 (different base types)
arch/arm/mach-omap1/timer.c:52:9: expected void const volatile [noderef] <asn:2>*<noident>
arch/arm/mach-omap1/timer.c:52:9: got unsigned int
Fix by using omap_{read,writel}(), just like the other users of the
MOD_CONF_CTRL_1 register in the OMAP1 codebase. Of course, in the long term,
removing omap_{read,write}l() is the appropriate thing to do; but
this will take some work to do this cleanly.
Looks like this was caused by 97933d6 (ARM: OMAP1: dmtimer: conversion
to platform devices) that dangerously moved code and changed it in
the same patch.
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 regression fixes from Ted Ts'o:
"This fixes a scalability problem reported by Andi Kleen and Tim Chen;
they were quite secretive about the precise nature of their workload,
but they later admitted that it only showed up when they were using a
large sparse file, so the amount of data I/O that was needed was close
to zero.
I'm not sure how realistic this is and it's only a regression if you
consider changes made since 2.6.39 to be a "regression" vis-a-vis the
policy regarding post-merge window bug fixes, but Linus agreed it was
worth fixing, so I'm including it in this pull request.
This also fixes the journalled quota mount options, which I
accidentally broke while I was cleaning up the mount option handling."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix handling of journalled quota options
ext4: address scalability issue by removing extent cache statistics
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
"A bunch of endianness fixes and a couple of nfsd error value fixes.
Speaking of endianness stuff, I'm rather tempted to slap
ccflags-y += -D__CHECK_ENDIAN__
in fs/Makefile, if not making it default for the entire tree; nfsd
regressions I've caught make one hell of a pile and we'd obviously
benefit from having that kind of stuff caught earlier..."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
lockd: fix the endianness bug
ocfs2: ->e_leaf_clusters endianness breakage
ocfs2: ->rl_count endianness breakage
ocfs: ->rl_used breakage on big-endian
ocfs2: ->l_next_free_req breakage on big-endian
btrfs: btrfs_root_readonly() broken on big-endian
ext4: fix endianness breakage in ext4_split_extent_at()
nfsd: fix compose_entry_fh() failure exits
nfsd: fix error value on allocation failure in nfsd4_decode_test_stateid()
nfsd: fix endianness breakage in TEST_STATEID handling
nfsd: fix error values returned by nfsd4_lockt() when nfsd_open() fails
nfsd: fix b0rken error value for setattr on read-only mount
ASoC: core: Fix card RTD count for deferred probe.
Currently we increment the number of RTD's per card during the DAI link
bind. This can cause an incorrect RTD count when we cannot find a component
and defer the probe (and hence perform the DAI link bind for the card again).
Fix the count so that it is cleared before every card registration
and bind attempt.
ARM: OMAP: serial: Fix the ocp smart idlemode handling bug
The current serial UART code, while fidling with ocp idlemode bits,
forget about the smart idle wakeup bit even if it is supported by
UART IP block. This will lead to missing the module wakeup on OMAP's
where the smart idle wakeup is supported.
This was the root cause of the console sluggishness issue, I have been
observing on OMAP4 devices and also can be potential reason for some
other UART wakeup issues.
Assigning sequence number for frames without taking care
of the fragment field breaks transmission of fragmented frames.
Fix this by assigning the fragment number properly.