Larry Finger [Thu, 30 Oct 2014 04:17:09 +0000 (23:17 -0500)]
rtlwifi: rtl8192se: Fix duplicate calls to ieee80211_register_hw()
Driver rtlwifi has been modified to call ieee80211_register_hw()
from the probe routine; however, the existing call in the callback
routine for deferred firmware loading was not removed.
Larry Finger [Thu, 30 Oct 2014 04:17:08 +0000 (23:17 -0500)]
rtlwifi: rtl8192ce: rtl8192de: rtl8192se: Fix handling for missing get_btc_status
The recent changes in checking for Bluetooth status added some callbacks to code
in rtlwifi. To make certain that all callbacks are defined, a dummy routine has been
added to rtlwifi, and the drivers that need to use it are modified.
Marc Yang [Wed, 29 Oct 2014 17:14:34 +0000 (22:44 +0530)]
mwifiex: restart rxreorder timer correctly
During 11n RX reordering, if there is a hole in RX table,
driver will not send packets to kernel until the rxreorder
timer expires or the table is full.
However, currently driver always restarts rxreorder timer when
receiving a packet, which causes the timer hardly to expire.
So while connected with to 11n AP in a busy environment,
ping packets may get blocked for about 30 seconds.
This patch fixes this timer restarting by ensuring rxreorder timer
would only be restarted either timer is not set or start_win
has changed.
Dan Carpenter [Wed, 29 Oct 2014 15:48:05 +0000 (18:48 +0300)]
ath9k: fix some debugfs output
The right shift operation has higher precedence than the mask so we
left shift by "(i * 3)" and then immediately right shift by "(i * 3)"
then we mask. It should be left shift, mask, and then right shift.
Back in commit 5136b2da770d ("PCI: convert bus code to use dev_groups"),
I misstyped the 'enable' sysfs filename as 'enabled', which broke the
userspace API. This patch fixes that issue by renaming the file back.
Linus Torvalds [Thu, 30 Oct 2014 16:11:38 +0000 (09:11 -0700)]
Merge tag 'sound-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Although the diffstat looks scary, it's just because of the removal of
the dead code (s6000), thus it must not affect anything serious.
Other than that, all small fixes. The only core fix is zero-clear for
a PCM compat ioctl. The rest are driver-specific, bebob, sgtl500,
adau1761, intel-sst, ad1889 and a few HD-audio quirks as usual"
* tag 'sound-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Add workaround for CMI8888 snoop behavior
ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
ALSA: bebob: Uninitialized id returned by saffirepro_both_clk_src_get
ALSA: hda/realtek - New SSID for Headset quirk
ALSA: ad1889: Fix probable mask then right shift defects
ALSA: bebob: fix wrong decoding of clock information for Terratec PHASE 88 Rack FW
ALSA: hda/realtek - Update restore default value for ALC283
ALSA: hda/realtek - Update restore default value for ALC282
ASoC: fsl: use strncpy() to prevent copying of over-long names
ASoC: adau1761: Fix input PGA volume
ASoC: s6000: remove driver
ASoC: Intel: HSW/BDW only support S16 and S24 formats.
ASoC: sgtl500: Document the required supplies
Jan Kara [Thu, 30 Oct 2014 14:53:17 +0000 (10:53 -0400)]
ext4: make ext4_ext_convert_to_initialized() return proper number of blocks
ext4_ext_convert_to_initialized() can return more blocks than are
actually allocated from map->m_lblk in case where initial part of the
on-disk extent is zeroed out. Luckily this doesn't have serious
consequences because the caller currently uses the return value
only to unmap metadata buffers. Anyway this is a data
corruption/exposure problem waiting to happen so fix it.
Jan Kara [Thu, 30 Oct 2014 14:53:17 +0000 (10:53 -0400)]
ext4: bail early when clearing inode journal flag fails
When clearing inode journal flag, we call jbd2_journal_flush() to force
all the journalled data to their final locations. Currently we ignore
when this fails and continue clearing inode journal flag. This isn't a
big problem because when jbd2_journal_flush() fails, journal is likely
aborted anyway. But it can still lead to somewhat confusing results so
rather bail out early.
Jan Kara [Thu, 30 Oct 2014 14:53:17 +0000 (10:53 -0400)]
ext4: bail out from make_indexed_dir() on first error
When ext4_handle_dirty_dx_node() or ext4_handle_dirty_dirent_node()
fail, there's really something wrong with the fs and there's no point in
continuing further. Just return error from make_indexed_dir() in that
case. Also initialize frames array so that if we return early due to
error, dx_release() doesn't try to dereference uninitialized memory
(which could happen also due to error in do_split()).
Theodore Ts'o [Thu, 30 Oct 2014 14:53:17 +0000 (10:53 -0400)]
jbd2: use a better hash function for the revoke table
The old hash function didn't work well for 64-bit block numbers, and
used undefined (negative) shift right behavior. Use the generic
64-bit hash function instead.
Dmitry Monakhov [Thu, 30 Oct 2014 14:53:16 +0000 (10:53 -0400)]
ext4: prevent bugon on race between write/fcntl
O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked
twice inside ext4_file_write_iter() and __generic_file_write() which
result in BUG_ON inside ext4_direct_IO.
Darrick J. Wong [Thu, 30 Oct 2014 14:53:16 +0000 (10:53 -0400)]
ext4: disallow changing journal_csum option during remount
ext4 does not permit changing the metadata or journal checksum feature
flag while mounted. Until we decide to support that, don't allow a
remount to change the journal_csum flag (right now we silently fail to
change anything).
Jan Kara [Thu, 30 Oct 2014 14:53:16 +0000 (10:53 -0400)]
ext4: fix oops when loading block bitmap failed
When we fail to load block bitmap in __ext4_new_inode() we will
dereference NULL pointer in ext4_journal_get_write_access(). So check
for error from ext4_read_block_bitmap().
Jan Kara [Thu, 30 Oct 2014 14:52:57 +0000 (10:52 -0400)]
ext4: fix overflow when updating superblock backups after resize
When there are no meta block groups update_backups() will compute the
backup block in 32-bit arithmetics thus possibly overflowing the block
number and corrupting the filesystem. OTOH filesystems without meta
block groups larger than 16 TB should be rare. Fix the problem by doing
the counting in 64-bit arithmetics.
Roman Gushchin [Thu, 23 Oct 2014 03:32:27 +0000 (03:32 +0000)]
igb: don't reuse pages with pfmemalloc flag
Incoming packet is dropped silently by sk_filter(), if the skb was
allocated from pfmemalloc reserves and the corresponding socket is
not marked with the SOCK_MEMALLOC flag.
Igb driver allocates pages for DMA with __skb_alloc_page(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, igb can get pages with pfmemalloc flag set.
If an incoming packet hits the pfmemalloc page and is large enough
(small packets are copying into the memory, allocated with
netdev_alloc_skb_ip_align(), so they are not affected), it will be
dropped.
This behavior is ok under high memory pressure, but the problem is
that the igb driver reuses these mapped pages. So, packets are still
dropping even if all memory issues are gone and there is a plenty
of free memory.
In my case, some TCP sessions hang on a small percentage (< 0.1%)
of machines days after OOMs.
VMWare's e1000 implementation does not seem to support unicast filtering.
This can be observed by configuring a macvlan interface on eth0 in a VM in
VMWare Fusion 5.0.5, and trying to use that interface instead of eth0.
Tested on 3.16.
Jan Kara [Wed, 22 Oct 2014 07:17:24 +0000 (09:17 +0200)]
rbd: Fix error recovery in rbd_obj_read_sync()
When we fail to allocate page vector in rbd_obj_read_sync() we just
basically ignore the problem and continue which will result in an oops
later. Fix the problem by returning proper error.
Mike Christie [Thu, 16 Oct 2014 06:50:19 +0000 (01:50 -0500)]
libceph: use memalloc flags for net IO
This patch has ceph's lib code use the memalloc flags.
If the VM layer needs to write data out to free up memory to handle new
allocation requests, the block layer must be able to make forward progress.
To handle that requirement we use structs like mempools to reserve memory for
objects like bios and requests.
The problem is when we send/receive block layer requests over the network
layer, net skb allocations can fail and the system can lock up.
To solve this, the memalloc related flags were added. NBD, iSCSI
and NFS uses these flags to tell the network/vm layer that it should
use memory reserves to fullfill allcation requests for structs like
skbs.
I am running ceph in a bunch of VMs in my laptop, so this patch was
not tested very harshly.
Ilya Dryomov [Thu, 9 Oct 2014 13:06:01 +0000 (17:06 +0400)]
rbd: use a single workqueue for all devices
Using one queue per device doesn't make much sense given that our
workfn processes "devices" and not "requests". Switch to a single
workqueue for all devices.
Ingo Molnar [Thu, 30 Oct 2014 06:37:37 +0000 (07:37 +0100)]
Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull two RCU fixes from Paul E. McKenney:
" - Complete the work of commit dd56af42bd82 (rcu: Eliminate deadlock
between CPU hotplug and expedited grace periods), which was
intended to allow synchronize_sched_expedited() to be safely
used when holding locks acquired by CPU-hotplug notifiers.
This commit makes the put_online_cpus() avoid the deadlock
instead of just handling the get_online_cpus().
- Complete the work of commit 35ce7f29a44a (rcu: Create rcuo
kthreads only for onlined CPUs), which was intended to allow
RCU to avoid allocating unneeded kthreads on systems where the
firmware says that there are more CPUs than are really present.
This commit makes rcu_barrier() aware of the mismatch, so that
it doesn't hang waiting for non-existent CPUs. "
mtd: cfi_cmdset_0001.c: fix resume for LH28F640BF chips
After '#echo mem > /sys/power/state' some devices can not be properly resumed
because apparently the MTD Partition Configuration Register has been reset
to default thus the rootfs cannot be mounted cleanly on resume.
An example of this can be found in the SA-1100 Developer's Manual at 9.5.3.3
where the second step of the Sleep Shutdown Sequence is described:
"An internal reset is applied to the SA-1100. All units are reset...".
As workaround we refresh the PCR value as done initially on chip setup.
This behavior and the fix are confirmed by our tests done on 2 different Zaurus
collie units with kernel 3.17.
Fixes: 812c5fa82bae: ("mtd: cfi_cmdset_0001.c: add support for Sharp LH28F640BF NOR") Signed-off-by: Dmitry Eremin-Solenikov <[email protected]> Signed-off-by: Andrea Adami <[email protected]> Cc: <[email protected]> # 3.16+ Signed-off-by: Brian Norris <[email protected]>
Frans Klaver [Mon, 27 Oct 2014 14:34:05 +0000 (15:34 +0100)]
mtd: omap: fix mtd devices not showing up
Since commit 6d178ef2fd5e ("mtd: nand: Move ELM driver and rename as
omap_elm"), I don't have any mtd devices present on my am335x. This
changes the link order of the omap_elm and omap2 objects, causing them
to probe in the wrong order.
To fix this, make elm_config defer probing until the omap_elm driver is
actually loaded.
Linus Torvalds [Wed, 29 Oct 2014 23:38:48 +0000 (16:38 -0700)]
Merge branch 'akpm' (incoming from Andrew Morton)
Merge misc fixes from Andrew Morton:
"21 fixes"
* emailed patches from Andrew Morton <[email protected]>: (21 commits)
mm/balloon_compaction: fix deflation when compaction is disabled
sh: fix sh770x SCIF memory regions
zram: avoid NULL pointer access in concurrent situation
mm/slab_common: don't check for duplicate cache names
ocfs2: fix d_splice_alias() return code checking
mm: rmap: split out page_remove_file_rmap()
mm: memcontrol: fix missed end-writeback page accounting
mm: page-writeback: inline account_page_dirtied() into single caller
lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
drivers/rtc/rtc-bq32k.c: fix register value
memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()
drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock
kernel/kmod: fix use-after-free of the sub_info structure
drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc
mm, thp: fix collapsing of hugepages on madvise
drivers: of: add return value to of_reserved_mem_device_init()
mm: free compound page with correct order
gcov: add ARM64 to GCOV_PROFILE_ALL
fsnotify: next_i is freed during fsnotify_unmount_inodes.
mm/compaction.c: avoid premature range skip in isolate_migratepages_range
...
mm/balloon_compaction: fix deflation when compaction is disabled
If CONFIG_BALLOON_COMPACTION=n balloon_page_insert() does not link pages
with balloon and doesn't set PagePrivate flag, as a result
balloon_page_dequeue() cannot get any pages because it thinks that all
of them are isolated. Without balloon compaction nobody can isolate
ballooned pages. It's safe to remove this check.
Mikulas Patocka [Wed, 29 Oct 2014 21:50:55 +0000 (14:50 -0700)]
mm/slab_common: don't check for duplicate cache names
The SLUB cache merges caches with the same size and alignment and there
was long standing bug with this behavior:
- create the cache named "foo"
- create the cache named "bar" (which is merged with "foo")
- delete the cache named "foo" (but it stays allocated because "bar"
uses it)
- create the cache named "foo" again - it fails because the name "foo"
is already used
That bug was fixed in commit 694617474e33 ("slab_common: fix the check
for duplicate slab names") by not warning on duplicate cache names when
the SLUB subsystem is used.
Recently, cache merging was implemented the with SLAB subsystem too, in 12220dea07f1 ("mm/slab: support slab merge")). Therefore we need stop
checking for duplicate names even for the SLAB subsystem.
d_splice_alias() can return a valid dentry, NULL or an ERR_PTR.
Currently the code checks not for ERR_PTR and will cuase an oops in
ocfs2_dentry_attach_lock(). Fix this by using IS_ERR_OR_NULL().
Commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") changed
page migration to uncharge the old page right away. The page is locked,
unmapped, truncated, and off the LRU, but it could race with writeback
ending, which then doesn't unaccount the page properly:
test_clear_page_writeback() migration
wait_on_page_writeback()
TestClearPageWriteback()
mem_cgroup_migrate()
clear PCG_USED
mem_cgroup_update_page_stat()
if (PageCgroupUsed(pc))
decrease memcg pages under writeback
release pc->mem_cgroup->move_lock
The per-page statistics interface is heavily optimized to avoid a
function call and a lookup_page_cgroup() in the file unmap fast path,
which means it doesn't verify whether a page is still charged before
clearing PageWriteback() and it has to do it in the stat update later.
Rework it so that it looks up the page's memcg once at the beginning of
the transaction and then uses it throughout. The charge will be
verified before clearing PageWriteback() and migration can't uncharge
the page as long as that is still set. The RCU lock will protect the
memcg past uncharge.
As far as losing the optimization goes, the following test results are
from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
three times in a nested fashion, so that there are two negative passes
that don't account but still go through the new transaction overhead.
There is no actual difference:
old: 33.195102545 seconds time elapsed ( +- 0.01% )
new: 33.199231369 seconds time elapsed ( +- 0.03% )
The time spent in page_remove_rmap()'s callees still adds up to the
same, but the time spent in the function itself seems reduced:
Jan Kara [Wed, 29 Oct 2014 21:50:44 +0000 (14:50 -0700)]
lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
If __bitmap_shift_left() or __bitmap_shift_right() are asked to shift by
a multiple of BITS_PER_LONG, they will try to shift a long value by
BITS_PER_LONG bits which is undefined. Change the functions to avoid
the undefined shift.
When hot removing memory, pgdat is set to 0 in try_offline_node(). But
if the pgdat is allocated by bootmem allocator, the clearing step is
skipped.
And when hot adding the same memory, the uninitialized pgdat is reused.
But free_area_init_node() checks wether pgdat is set to zero. As a
result, free_area_init_node() hits WARN_ON().
This patch clears pgdat which is allocated by bootmem allocator in
try_offline_node().
Marek Szyprowski [Wed, 29 Oct 2014 21:50:38 +0000 (14:50 -0700)]
drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock
Fix unconditional initialization failure on non-exynos3250 SoCs.
Commit df9e26d093d3 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
introduced rtc source clock support, but also added initialization
failure on SoCs, which doesn't need such clock.
kernel/kmod: fix use-after-free of the sub_info structure
Found this in the message log on a s390 system:
BUG kmalloc-192 (Not tainted): Poison overwritten
Disabling lock debugging due to kernel taint
INFO: 0x00000000684761f4-0x00000000684761f7. First byte 0xff instead of 0x6b
INFO: Allocated in call_usermodehelper_setup+0x70/0x128 age=71 cpu=2 pid=648
__slab_alloc.isra.47.constprop.56+0x5f6/0x658
kmem_cache_alloc_trace+0x106/0x408
call_usermodehelper_setup+0x70/0x128
call_usermodehelper+0x62/0x90
cgroup_release_agent+0x178/0x1c0
process_one_work+0x36e/0x680
worker_thread+0x2f0/0x4f8
kthread+0x10a/0x120
kernel_thread_starter+0x6/0xc
kernel_thread_starter+0x0/0xc
INFO: Freed in call_usermodehelper_exec+0x110/0x1b8 age=71 cpu=2 pid=648
__slab_free+0x94/0x560
kfree+0x364/0x3e0
call_usermodehelper_exec+0x110/0x1b8
cgroup_release_agent+0x178/0x1c0
process_one_work+0x36e/0x680
worker_thread+0x2f0/0x4f8
kthread+0x10a/0x120
kernel_thread_starter+0x6/0xc
kernel_thread_starter+0x0/0xc
There is a use-after-free bug on the subprocess_info structure allocated
by the user mode helper. In case do_execve() returns with an error
____call_usermodehelper() stores the error code to sub_info->retval, but
sub_info can already have been freed.
Regarding UMH_NO_WAIT, the sub_info structure can be freed by
__call_usermodehelper() before the worker thread returns from
do_execve(), allowing memory corruption when do_execve() failed after
exec_mmap() is called.
Regarding UMH_WAIT_EXEC, the call to umh_complete() allows
call_usermodehelper_exec() to continue which then frees sub_info.
To fix this race the code needs to make sure that the call to
call_usermodehelper_freeinfo() is always done after the last store to
sub_info->retval.
drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc
Adds support for RTC device inside PM8941 PMIC. The RTC in this PMIC
have two register spaces. Thus the rtc-pm8xxx is slightly reworked to
reflect these differences.
The register set for different PMIC chips are selected on DT compatible
string base.
David Rientjes [Wed, 29 Oct 2014 21:50:31 +0000 (14:50 -0700)]
mm, thp: fix collapsing of hugepages on madvise
If an anonymous mapping is not allowed to fault thp memory and then
madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
collapse this memory into thp memory.
This occurs because the madvise(2) handler for thp, hugepage_madvise(),
clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
until the final action of madvise_behavior(). This causes the
khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
the vma had previously had VM_NOHUGEPAGE set.
Fix this by passing the correct vma flags to the khugepaged mm slot
handler. There's no chance khugepaged can run on this vma until after
madvise_behavior() returns since we hold mm->mmap_sem.
It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
in hugepage_advise(), but I didn't want to introduce special case
behavior into madvise_behavior(). I think it's best to just let it
always set vma->vm_flags itself.
Marek Szyprowski [Wed, 29 Oct 2014 21:50:29 +0000 (14:50 -0700)]
drivers: of: add return value to of_reserved_mem_device_init()
Driver calling of_reserved_mem_device_init() might be interested if the
initialization has been successful or not, so add support for returning
error code.
This fixes a build warining caused by commit 7bfa5ab6fa1b ("drivers:
dma-coherent: add initialization from device tree"), which has been
merged without this change and without fixing function return value.
Yu Zhao [Wed, 29 Oct 2014 21:50:26 +0000 (14:50 -0700)]
mm: free compound page with correct order
Compound page should be freed by put_page() or free_pages() with correct
order. Not doing so will cause tail pages leaked.
The compound order can be obtained by compound_order() or use
HPAGE_PMD_ORDER in our case. Some people would argue the latter is
faster but I prefer the former which is more general.
This bug was observed not just on our servers (the worst case we saw is
11G leaked on a 48G machine) but also on our workstations running Ubuntu
based distro.
Jerry Hoemann [Wed, 29 Oct 2014 21:50:22 +0000 (14:50 -0700)]
fsnotify: next_i is freed during fsnotify_unmount_inodes.
During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:
spin_lock(&inode->i_lock);
if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
spin_unlock(&inode->i_lock);
continue;
}
As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.
Multiple crash dumps showed:
The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself. As this is not the value of list upon entry to the
function, the kernel never exits the loop.
To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del. This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.
Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.
We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.
The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget. However, the code doesn't do the
__iget call on next_i
if i_count == 0 or
if i_state & (I_FREEING | I_WILL_FREE)
The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list. This makes the handling of next_i more closely match the
handling of the variable "inode."
The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.
During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate. Advance next_i in those cases where
__iget is not done.
Joonsoo Kim [Wed, 29 Oct 2014 21:50:20 +0000 (14:50 -0700)]
mm/compaction.c: avoid premature range skip in isolate_migratepages_range
Commit edc2ca612496 ("mm, compaction: move pageblock checks up from
isolate_migratepages_range()") commonizes isolate_migratepages variants
and make them use isolate_migratepages_block().
isolate_migratepages_block() could stop the execution when enough pages
are isolated, but, there is no code in isolate_migratepages_range() to
handle this case. In the result, even if isolate_migratepages_block()
returns prematurely without checking all pages in the range,
isolate_migratepages_block() is called repeately on the following
pageblock and some pages in the previous range are skipped to check.
Then, CMA is failed frequently due to this fact.
To fix this problem, this patch let isolate_migratepages_range() know
the situation that enough pages are isolated and stop the isolation in
that case.
Note that isolate_migratepages() has no such problem, because, it always
stops the isolation after just one call of isolate_migratepages_block().
Wang Nan [Wed, 29 Oct 2014 21:50:18 +0000 (14:50 -0700)]
cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.
Commit ff7ee93f4715 ("cgroup/kmemleak: Annotate alloc_page() for cgroup
allocations") introduces kmemleak_alloc() for alloc_page_cgroup(), but
corresponding kmemleak_free() is missing, which makes kmemleak be
wrongly disabled after memory offlining. Log is pasted at the end of
this commit message.
This patch add kmemleak_free() into free_page_cgroup(). During page
offlining, this patch removes corresponding entries in kmemleak rbtree.
After that, the freed memory can be allocated again by other subsystems
without killing kmemleak.
bash # for x in 1 2 3 4; do echo offline > /sys/devices/system/memory/memory$x/state ; sleep 1; done ; dmesg | grep leak
inet: frags: remove the WARN_ON from inet_evict_bucket
The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.
inet: frags: fix a race between inet_evict_bucket and inet_frag_kill
When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
- then the timer ref is removed, but inet_evict_bucket will not add
this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
- then the timer ref is kept until the evictor "expires" the queue, but
inet_frag_kill will remove the initial ref and will set
INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.
Tony Lindgren [Mon, 27 Oct 2014 20:05:54 +0000 (13:05 -0700)]
ARM: OMAP2+: Warn about deprecated legacy booting mode
We're moving omaps to use device tree based booting and already have
omap2, omap4, omap5, am335x and am437x booting in device tree only
mode.
Only omap3 still has legacy booting still around and we really want
to make that device tree only. So let's add a warning about deprecated
legacy booting so we get people to upgrade their boards to use device
tree based booting and find out about any remaining issues.
Note that for most boards we already have the .dts file and those can
be booted with without changing the bootloader using the appended
DTB mode.
Linus Torvalds [Wed, 29 Oct 2014 18:57:10 +0000 (11:57 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"A small collection of fixes for the current kernel. This contains:
- Two error handling fixes from Jan Kara. One for null_blk on
failure to add a device, and the other for the block/scsi_ioctl
SCSI_IOCTL_SEND_COMMAND fixing up the error jump point.
- A commit added in the merge window for the bio integrity bits
unfortunately disabled merging for all requests if
CONFIG_BLK_DEV_INTEGRITY wasn't set. Reverse the logic, so that
integrity checking wont disallow merges when not enabled.
- A fix from Ming Lei for merging and generating too many segments.
This caused a BUG in virtio_blk.
- Two error handling printk() fixups from Robert Elliott, improving
the information given when we rate limit.
- Error handling fixup on elevator_init() failure from Sudip
Mukherjee.
- A fix from Tony Battersby, fixing up a memory leak in the
scatterlist handling with scsi-mq"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not defined
lib/scatterlist: fix memory leak with scsi-mq
block: fix wrong error return in elevator_init()
scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
null_blk: Cleanup error recovery in null_add_dev()
blk-merge: recaculate segment if it isn't less than max segments
fs: clarify rate limit suppressed buffer I/O errors
fs: merge I/O error prints into one line
Linus Torvalds [Wed, 29 Oct 2014 18:52:35 +0000 (11:52 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
- workarounds for a couple of misbehaving Elan Touchscreens, by Adel
Gadllah
- fix for TransducerSerialNumber field implementation, by Jason Gerecke
- a couple of new HID usages (added by HUT), by Olivier Gay
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: input: Fix TransducerSerialNumber implementation
HID: add keyboard input assist hid usages
HID: usbhid: enable always-poll quirk for Elan Touchscreen 016f
HID: usbhid: enable always-poll quirk for Elan Touchscreen 009b
Linus Torvalds [Wed, 29 Oct 2014 18:47:42 +0000 (11:47 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull Integrity subsystem fix from James Morris:
"These changes fix a bug in xattr handling, where the evm and ima
inode_setxattr() functions do not check for empty xattrs being passed
from userspace (leading to user-triggerable null pointer
dereferences)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
evm: check xattr value length and type in evm_inode_setxattr()
ima: check xattr value length and type in the ima_inode_setxattr()
Linus Torvalds [Wed, 29 Oct 2014 18:11:44 +0000 (11:11 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux
Pull powerpc updates from Michael Ellerman:
"There's some bug fixes or cleanups to facilitate fixes, a MAINTAINERS
update, and a new syscall (bpf)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
powerpc/numa: ensure per-cpu NUMA mappings are correct on topology update
powerpc/numa: use cached value of update->cpu in update_cpu_topology
cxl: Fix PSL error due to duplicate segment table entries
powerpc/mm: Use appropriate ESID mask in copro_calculate_slb()
cxl: Refactor cxl_load_segment() and find_free_sste()
cxl: Disable secondary hash in segment table
Revert "powerpc/powernv: Fix endian bug in LPC bus debugfs accessors"
powernv: Use _GLOBAL_TOC for opal wrappers
powerpc: Wire up sys_bpf() syscall
MAINTAINERS: nx-842 driver maintainer change
powerpc/mm: Remove redundant #if case
powerpc/mm: Fix build error with hugetlfs disabled
Thomas Petazzoni [Mon, 20 Oct 2014 18:42:18 +0000 (19:42 +0100)]
ARM: 8180/1: mm: implement no-highmem fast path in kmap_atomic_pfn()
Since CONFIG_HIGHMEM got enabled on ARMv5 Kirkwood, we have noticed a
very significant drop in networking performance. The test were
conducted on an OpenBlocks A7 board. Without this patch, the outgoing
performance measured with iperf are:
- highmem OFF, TSO OFF 544 Mbit/s
- highmem OFF, TSO ON 942 Mbit/s
- highmem ON, TSO OFF 306 Mbit/s
- highmem ON, TSO ON 246 Mbit/s
On this Kirkwood platform, the L2 cache is a Feroceon cache, and with
this cache, all the range operations have to be done on virtual
addresses and not physical addresses. Therefore, whenever
CONFIG_HIGHMEM is enabled, the cache maintenance operations call
kmap_atomic_pfn() and kunmap_atomic().
However, kmap_atomic_pfn() does not implement the same fast path for
non-highmem pages as the one implemented in kmap_atomic(), and this is
one of the reason for the performance drop. While this patch does not
fully restore the performances, it clearly improves them a lot:
without patch with patch
- highmem ON, TSO OFF 306 Mbit/s 387 Mbit/s
- highmem ON, TSO ON 246 Mbit/s 434 Mbit/s
We're still far from the !CONFIG_HIGHMEM performances, but it does
improve a bit the situation.
Thanks a lot to Ezequiel Garcia and Gregory Clement for all the
testing work around this topic.
"I'd ask for one change. Please make all these messages start with
"L2C-310 OF" not "PL310 OF:". The device is described in ARM
documentation as a L2C-310 not PL310. (Also note the : is dropped
too - most of the other messages don't have the : either.)
The:
"PL310 OF: cache setting yield illegal associativity
PL310 OF: -1073346556 calculated, only 8 and 16 legal"
message could also be changed to something like:
"L2C-310 OF cache associativity %d invalid, only 8 or 16 permittedn"
Laura Abbott [Fri, 24 Oct 2014 18:49:01 +0000 (19:49 +0100)]
ARM: 8181/1: Drop extra return statement
Commit 513510ddba9650fc7da456eefeb0ead7632324f6
(common: dma-mapping: introduce common remapping functions)
managed to end up with an extra return statement from the
original patch. Drop it.
Dan Carpenter [Wed, 29 Oct 2014 15:47:07 +0000 (18:47 +0300)]
drm/radeon: remove some buggy dead code
The calculation of "num_shader_engines" has a precedence bug because
the right shift happens before the mask, but this variable is never used
so we can just delete it.
Richard Zhu [Mon, 27 Oct 2014 05:17:32 +0000 (13:17 +0800)]
PCI: imx6: Wait for clocks to stabilize after ref_en
For boards without a reset GPIO we skip the delay between enabling the
pcie_ref_clk and touching the RC registers for configuration. This hangs
the system if there isn't a proper delay to ensure the clocks are settled
in the DW PCIe core.
Also iMX6Q always needs an additional 10us delay to make sure the reset is
propagated through the core, as we don't have an explicitly controlled
reset input on this SoC.
This fixes a problem with 3fce0e882f61 ("PCI: imx6: Delay enabling
reference clock for SS until it stabilizes"): the kernel doesn't boot on
systems that don't pass the PCI GPIO reset in the DTB. This regression
affects mx6 nitrogen boards.
[bhelgaas: add regression info in changelog] Fixes: 3fce0e882f61 ("PCI: imx6: Delay enabling reference clock for SS until it stabilizes") Reported-by: Fabio Estevam <[email protected]> Tested-by: Fabio Estevam <[email protected]> Signed-off-by: Richard Zhu <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Acked-by: Lucas Stach <[email protected]>
Lv Zheng [Wed, 29 Oct 2014 03:33:49 +0000 (11:33 +0800)]
ACPI / EC: Fix regression due to conflicting firmware behavior between Samsung and Acer.
It is reported that Samsung laptops that need to poll events are broken by
the following commit:
Commit 3afcf2ece453e1a8c2c6de19cdf06da3772a1b08
Subject: ACPI / EC: Add support to disallow QR_EC to be issued when SCI_EVT isn't set
The behaviors of the 2 vendor firmwares are conflict:
1. Acer: OSPM shouldn't issue QR_EC unless SCI_EVT is set, firmware
automatically sets SCI_EVT as long as there is event queued up.
2. Samsung: OSPM should issue QR_EC whatever SCI_EVT is set, firmware
returns 0 when there is no event queued up.
This patch is a quick fix to distinguish the behaviors to make Acer
behavior only effective for Acer EC firmware so that the breakages on
Samsung EC firmware can be avoided.
Lv Zheng [Wed, 29 Oct 2014 03:33:43 +0000 (11:33 +0800)]
Revert "ACPI / EC: Add support to disallow QR_EC to be issued before completing previous QR_EC"
It is reported that the following commit breaks Samsung hardware:
Commit: 558e4736f2e1b0e6323adf7a5e4df77ed6cfc1a4.
Subject: ACPI / EC: Add support to disallow QR_EC to be issued before
completing previous QR_EC
Which means the Samsung behavior conflicts with the Acer behavior.
1. Samsung may behave like:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
write QR_EC
read event
[ -event 1 ] SCI_EVT clear
Without the above commit, Samsung can work:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
write QR_EC
CAN prepare next QR_EC as SCI_EVT=1
read event
[ -event 1 ] SCI_EVT clear
write QR_EC
read event
[ -event 2 ] SCI_EVT clear
With the above commit, Samsung cannot work:
[ +event 1 ] SCI_EVT set
[ +event 2 ] SCI_EVT set
write QR_EC
read event
[ -event 1 ] SCI_EVT clear
CANNOT prepare next QR_EC as SCI_EVT=0
2. Acer may behave like:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
write QR_EC
read event
[ -event 1 ] SCI_EVT clear
[ +event 2 ] SCI_EVT set
Without the above commit, Acer cannot work when there is only 1 event:
[ +event 1 ] SCI_EVT set
write QR_EC
can prepared next QR_EC as SCI_EVT=1
read event
[ -event 1 ] SCI_EVT clear
CANNOT write QR_EC as SCI_EVT=0
With the above commit, Acer can work:
[ +event 1 ] SCI_EVT set
[ +event 2 ]
write QR_EC
read event
[ -event 1 ] SCI_EVT set
can prepare next QR_EC because SCI_EVT=0
CAN write QR_EC as SCI_EVT=1
Takashi Iwai [Wed, 29 Oct 2014 15:13:05 +0000 (16:13 +0100)]
ALSA: hda - Add workaround for CMI8888 snoop behavior
CMI8888 shows the stuttering playback when the snooping is disabled
on the audio buffer. Meanwhile, we've got reports that CORB/RIRB
doesn't work in the snooped mode. So, as a compromise, disable the
snoop only for CORB/RIRB and enable the snoop for the stream buffers.
The resultant patch became a bit ugly, unfortunately, but we still can
live with it.
Dan Carpenter [Wed, 29 Oct 2014 10:01:36 +0000 (13:01 +0300)]
Documentation/SubmittingPatches: Reported-by tags and permission
The reported-by text says you have to ask for permission, but that
should only be if the bug was reported in private. These days the
standard is to always give reported-by credit or it's considered a bit
rude.
Masami Hiramatsu [Mon, 27 Oct 2014 20:31:24 +0000 (16:31 -0400)]
perf probe: Trivial typo fix for --demangle
Replace "Disable" with "Enable", since --demangle option enables symbol
demangling, not disable it.
perf probe has --demangle and --no-demangle options, but the
command-line help (--help) shows only --demangle option. So it should
explain about --demangle.
Namhyung Kim [Mon, 6 Oct 2014 00:46:01 +0000 (09:46 +0900)]
perf callchain: Use global caching provided by libunwind
The libunwind provides two caching policy which are global and
per-thread. As perf unwinds callchains in a single thread, it'd
sufficient to use global caching.
This speeds up my perf report from 14s to 7s on a ~260MB data file.
Although the output sometimes contains a slight difference (~0.01% in
terms of number of lines printed) on callchains which were not resolved.
Paolo Bonzini [Mon, 27 Oct 2014 13:40:39 +0000 (14:40 +0100)]
KVM: emulator: fix execution close to the segment limit
Emulation of code that is 14 bytes to the segment limit or closer
(e.g. RIP = 0xFFFFFFF2 after reset) is broken because we try to read as
many as 15 bytes from the beginning of the instruction, and __linearize
fails when the passed (address, size) pair reaches out of the segment.
To fix this, let __linearize return the maximum accessible size (clamped
to 2^32-1) for usage in __do_insn_fetch_bytes, and avoid the limit check
by passing zero for the desired size.
For expand-down segments, __linearize is performing a redundant check.
(u32)(addr.ea + size - 1) <= lim can only happen if addr.ea is close
to 4GB; in this case, addr.ea + size - 1 will also fail the check against
the upper bound of the segment (which is provided by the D/B bit).
After eliminating the redundant check, it is simple to compute
the *max_size for expand-down segments too.
Now that the limit check is done in __do_insn_fetch_bytes, we want
to inject a general protection fault there if size < op_size (like
__linearize would have done), instead of just aborting.
This fixes booting Tiano Core from emulated flash with EPT disabled.
Paolo Bonzini [Mon, 27 Oct 2014 13:40:49 +0000 (14:40 +0100)]
KVM: emulator: fix error code for __linearize
The error code for #GP and #SS is zero when the segment is used to
access an operand or an instruction. It is only non-zero when
a segment register is being loaded; for limit checks this means
cases such as:
* for #GP, when RIP is beyond the limit on a far call (before the first
instruction is executed). We do not implement this check, but it
would be in em_jmp_far/em_call_far.
* for #SS, if the new stack overflows during an inter-privilege-level
call to a non-conforming code segment. We do not implement stack
switching at all.
Fabio Estevam [Wed, 29 Oct 2014 11:06:31 +0000 (12:06 +0100)]
ARM: 8182/1: l2c: Make l2x0_cache_size_of_parse() return 'int'
Since commit f3354ab67476dc80 ("ARM: 8169/1: l2c: parse cache properties from
ePAPR definitions") the following error is seen on imx6q:
[ 0.000000] PL310 OF: cache setting yield illegal associativity
[ 0.000000] PL310 OF: -2147097556 calculated, only 8 and 16 legal
As imx6q does not pass the "cache-size" and "cache-sets" properties in DT, the function l2x0_cache_size_of_parse() returns early and keep the 'associativity' pointer uninitialized.
To fix this problem, return error codes inside l2x0_cache_size_of_parse() and only use the 'associativity' pointer result if l2x0_cache_size_of_parse() succeeds.
Peter Zijlstra has attempted to help out, to clean up the mess:
https://lkml.org/lkml/2014/10/28/543
But has not received helpful and constructive replies which makes
me doubt wether it can all be finished in time until v3.18 is
released.
Despite various review feedback the author (Andi Kleen) has answered
only few of the review questions and has generally been uncooperative,
only giving replies when prompted repeatedly, and only giving minimal
answers instead of constructively explaining and helping along the effort.
That kind of behavior is not acceptable.
There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
commit from Andi Kleen:
Which is not yet resolved. The uncore driver is independent in theory,
but the crash makes me worry about how well all these patches were
tested and makes me uneasy about the level of interminging that the
Broadwell and Haswell code has received by the commits above.
As a first step to resolve the mess revert the Broadwell client commits
back to the v3.17 version, before we run out of time and problematic
code hits a stable upstream kernel.
( If the Haswell-EP crash is not resolved via a simple fix then we'll have
to revert the Haswell-EP uncore driver as well. )
The Broadwell client series has to be submitted in a clean fashion, with
single, well documented changes per patch. If they are submitted in time
and are accepted during review then they can possibly go into v3.19 but
will need additional scrutiny due to the rocky history of this patch set.
The commit which introduced TransducerSerialNumber (368c966) is missing
two crucial implementation details. Firstly, the commit does not set the
type/code/bit/max fields as expected later down the code which can cause
the driver to crash when a tablet with this usage is connected. Secondly,
the call to 'set_bit' causes MSC_PULSELED to be sent instead of the
expected MSC_SERIAL. This commit addreses both issues.
Dexuan Cui [Wed, 29 Oct 2014 10:53:37 +0000 (03:53 -0700)]
x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE
pte_pfn() returns a PFN of long (32 bits in 32-PAE), so "long <<
PAGE_SHIFT" will overflow for PFNs above 4GB.
Due to this issue, some Linux 32-PAE distros, running as guests on Hyper-V,
with 5GB memory assigned, can't load the netvsc driver successfully and
hence the synthetic network device can't work (we can use the kernel parameter
mem=3000M to work around the issue).
Ian Abbott [Mon, 20 Oct 2014 14:10:40 +0000 (15:10 +0100)]
staging: comedi: fix memory leak / bad pointer freeing for chanlist
As a follow-up to commit 6cab7a37f5c04 ("staging: comedi: (regression)
channel list must be set for COMEDI_CMD ioctl"), Hartley Sweeten pointed
out another couple of bugs stemming from commit 6cab7a37f5c04 ("staging:
comedi: comedi_fops: introduce __comedi_get_user_chanlist()").
Firstly, `do_cmdtest_ioctl()` never frees the kernel copy of the user
chanlist allocated by `__comedi_get_user_chanlist()`, so that memory is
leaked. Fix it by freeing the allocated kernel memory pointed to by
`cmd.chanlist` before that pointer is overwritten with its original
pointer to user memory before `cmd` is copied back to user-space.
Secondly, if `__comedi_get_user_chanlist()` returns an error,
`cmd->chanlist` is left unchanged and in fact will be a pointer to user
memory. This causes `do_cmd_ioctl()` to `goto cleanup` and call
`do_become_nonbusy()` which would attempt to free the memory pointed to
by the user-space pointer. Fix it by setting `cmd->chanlist` to NULL at
the start of `__comedi_get_user_chanlist()`.
A merge conflict between commits fbfd9c8a1782f33d7b67294b2a42587063e61c0c ("staging: comedi:
addi_apci_3120: use dma_alloc_coherent()") and aff5b1f8eb71b64bb613dc64c50b6904e89f79b9 ("staging: comedi: remove
comedi_fc module") left the COMEDI_ADDI_APCI_3120 config option
depending on VIRT_TO_BUS when it no longer needs to do so. The
dependency was removed by the first commit and accidentally reinstated
by the second commit. Remove the dependency again.
Ian Abbott [Tue, 28 Oct 2014 17:15:47 +0000 (17:15 +0000)]
staging: comedi: widen subdevice number argument in ioctl handlers
For the `COMEDI_LOCK`, `COMEDI_UNLOCK`, `COMEDI_CANCEL`, and
`COMEDI_POLL` ioctls the third argument is a comedi subdevice number.
This is passed as an `unsigned long`, but when it is passed down to the
ioctl command-specific handler functions `do_lock_ioctl()`,
`do_unlock_ioctl()`, `do_cancel_ioctl()`, and `do_poll_ioctl()`, the
value has been narrowed to an `unsigned int`. Pass through the argument
as an `unsigned long` to avoid truncating the value on 64-bit
architectures.