Git Repo - linux.git/log

drm/vmwgfx: Fix hash key computation

The hash key computation in vmw_cmdbuf_res_remove incorrectly didn't take
the resource type into account, contrary to all the other related functions.
This becomes important when the cmdbuf resource manager handles more than
one resource type.

Signed-off-by: Thomas Hellstrom <[email protected]>
Reviewed-by: Brian Paul <[email protected]>

drm/vmwgfx: fix lock breakage

After:

commit d059f652e73c35678d28d4cd09ab2cec89696af9
Author:     Daniel Vetter <[email protected]>
AuthorDate: Fri Jul 25 18:07:40 2014 +0200

    drm: Handle legacy per-crtc locking with full acquire ctx

drm_mode_cursor_common() was switched to use drm_modeset_(un)lock_crtc()
which uses full aquire ctx.  So dropping/reaquiring the lock via
drm_modeset_(un)lock() directly isn't the right thing to do, as lockdep
kindly points out.

The 'FIXME's about sorting out whether vmwgfx *really* needs to lock-all
for cursor updates still apply.

Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Jakob Bornecrantz <[email protected]>
Tested-by: Thomas Hellstrom <[email protected]>

powerpc/powernv: Properly fix LPC debugfs endianness

Endian is hard, especially when I designed a stupid FW interface, and
I should know better... oh well, this is attempt #2 at fixing this
properly. This time it seems to work with all access sizes and I
can run my flashing tool (which exercises all sort of access sizes
and types to access the SPI controller in the BMC) just fine.

Signed-off-by: Benjamin Herrenschmidt <[email protected]>
CC: [email protected]
Signed-off-by: Michael Ellerman <[email protected]>

powerpc: do_notify_resume can be called with bad thread_info flags argument

Back in 7230c5644188 ("powerpc: Rework lazy-interrupt handling") we
added a call out to restore_interrupts() (written in c) before calling
do_notify_resume:

        bl      restore_interrupts
        addi    r3,r1,STACK_FRAME_OVERHEAD
        bl      do_notify_resume

Unfortunately do_notify_resume takes two arguments, the second one
being the thread_info flags:

void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)

We do populate r4 (the second argument) earlier, but
restore_interrupts() is free to muck it up all it wants. My guess is
the gcc compiler gods shone down on us and its register allocator
never used r4. Sometimes, rarely, luck is on our side.

LLVM on the other hand did trample r4.

Signed-off-by: Anton Blanchard <[email protected]>
Cc: [email protected]
Signed-off-by: Michael Ellerman <[email protected]>

drivers/net: macvtap and tun depend on INET

These drivers now call ipv6_proxy_select_ident(), which is defined
only if CONFIG_INET is enabled. However, they have really depended
on CONFIG_INET for as long as they have allowed sending GSO packets
from userland.

Reported-by: kbuild test robot <[email protected]>
Signed-off-by: Ben Hutchings <[email protected]>
Fixes: f43798c27684 ("tun: Allow GSO using virtio_net_hdr")
Fixes: b9fb9ee07e67 ("macvtap: add GSO/csum offload support")
Fixes: 5188cd44c55d ("drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets")
Signed-off-by: David S. Miller <[email protected]>

tracing/syscalls: Ignore numbers outside NR_syscalls' range

ARM has some private syscalls (for example, set_tls(2)) which lie
outside the range of NR_syscalls.  If any of these are called while
syscall tracing is being performed, out-of-bounds array access will
occur in the ftrace and perf sys_{enter,exit} handlers.

# trace-cmd record -e raw_syscalls:* true && trace-cmd report
...
true-653   [000]   384.675777: sys_enter:            NR 192 (0, 1000, 3, 4000022, ffffffff, 0)
true-653   [000]   384.675812: sys_exit:             NR 192 = 1995915264
true-653   [000]   384.675971: sys_enter:            NR 983045 (76f74480, 76f74000, 76f74b28, 76f74480, 76f76f74, 1)
true-653   [000]   384.675988: sys_exit:             NR 983045 = 0
...

# trace-cmd record -e syscalls:* true
[   17.289329] Unable to handle kernel paging request at virtual address aaaaaace
[   17.289590] pgd = 9e71c000
[   17.289696] [aaaaaace] *pgd=00000000
[   17.289985] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[   17.290169] Modules linked in:
[   17.290391] CPU: 0 PID: 704 Comm: true Not tainted 3.18.0-rc2+ #21
[   17.290585] task: 9f4dab00 ti: 9e710000 task.ti: 9e710000
[   17.290747] PC is at ftrace_syscall_enter+0x48/0x1f8
[   17.290866] LR is at syscall_trace_enter+0x124/0x184

Fix this by ignoring out-of-NR_syscalls-bounds syscall numbers.

Commit cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
added the check for less than zero, but it should have also checked
for greater than NR_syscalls.

Link: http://lkml.kernel.org/p/[email protected]
Fixes: cd0980fc8add "tracing: Check invalid syscall nr while tracing syscalls"
Cc: [email protected] # 2.6.33+
Signed-off-by: Rabin Vincent <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>

Merge branch 'ufo-fix'

Ben Hutchings says:

====================
drivers/net,ipv6: Fix IPv6 fragment ID selection for virtio

The virtio net protocol supports UFO but does not provide for passing a
fragment ID for fragmentation of IPv6 packets. We used to generate a
fragment ID wherever such a packet was fragmented, but currently we
always use ID=0!

v2: Add blank lines after declarations
====================

Signed-off-by: David S. Miller <[email protected]>

drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets

UFO is now disabled on all drivers that work with virtio net headers,
but userland may try to send UFO/IPv6 packets anyway. Instead of
sending with ID=0, we should select identifiers on their behalf (as we
used to).

Signed-off-by: Ben Hutchings <[email protected]>
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <[email protected]>

drivers/net: Disable UFO through virtio

IPv6 does not allow fragmentation by routers, so there is no
fragmentation ID in the fixed header. UFO for IPv6 requires the ID to
be passed separately, but there is no provision for this in the virtio
net protocol.

Until recently our software implementation of UFO/IPv6 generated a new
ID, but this was a bug. Now we will use ID=0 for any UFO/IPv6 packet
passed through a tap, which is even worse.

Unfortunately there is no distinction between UFO/IPv4 and v6
features, so disable UFO on taps and virtio_net completely until we
have a proper solution.

We cannot depend on VM managers respecting the tap feature flags, so
keep accepting UFO packets but log a warning the first time we do
this.

Signed-off-by: Ben Hutchings <[email protected]>
Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data")
Signed-off-by: David S. Miller <[email protected]>

net: skb_fclone_busy() needs to detect orphaned skb

Some drivers are unable to perform TX completions in a bound time.
They instead call skb_orphan()

Problem is skb_fclone_busy() has to detect this case, otherwise
we block TCP retransmits and can freeze unlucky tcp sessions on
mostly idle hosts.

Signed-off-by: Eric Dumazet <[email protected]>
Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues")
Signed-off-by: David S. Miller <[email protected]>

gre: Use inner mac length when computing tunnel length

Currently, skb_inner_network_header is used but this does not account
for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which
handles TEB and also should work with IP encapsulation in which case
inner mac and inner network headers are the same.

Tested: Ran TCP_STREAM over GRE, worked as expected.

Signed-off-by: Tom Herbert <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'mellanox-net'

Or Gerlitz says:

====================
mlx4 driver encapsulation/steering fixes

The 1st patch fixes a bug in the TX path that supports offloading the
TX checksum of (VXLAN) encapsulated TCP packets. It turns out that the
bug is revealed only when the receiver runs in non-offloaded mode, so
we somehow missed it so far... please queue it for -stable >= 3.14

The 2nd patch makes sure not to leak steering entry on error flow,
please queue it to 3.17-stable
====================

Signed-off-by: David S. Miller <[email protected]>

mlx4: Avoid leaking steering rules on flow creation error flow

If mlx4_ib_create_flow() attempts to create > 1 rules with the
firmware, and one of these registrations fail, we leaked the
already created flow rules.

One example of the leak is when the registration of the VXLAN ghost
steering rule fails, we didn't unregister the original rule requested
by the user, introduced in commit d2fce8a9060d "mlx4: Set
user-space raw Ethernet QPs to properly handle VXLAN traffic".

While here, add dump of the VXLAN portion of steering rules
so it can actually be seen when flow creation fails.

Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

net/mlx4_en: Don't attempt to TX offload the outer UDP checksum for VXLAN

For VXLAN/NVGRE encapsulation, the current HW doesn't support offloading
both the outer UDP TX checksum and the inner TCP/UDP TX checksum.

The driver doesn't advertize SKB_GSO_UDP_TUNNEL_CSUM, however we are wrongly
telling the HW to offload the outer UDP checksum for encapsulated packets,
fix that.

Fixes: 837052d0ccc5 ('net/mlx4_en: Add netdev support for TCP/IP
offloads of vxlan tunneling')
Signed-off-by: Or Gerlitz <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-10-30

This series contains updates to e1000, igb and ixgbe.

Francesco Ruggeri fixes an issue with e1000 where in a VM the driver did
not support unicast filtering.

Roman Gushchin fixes an issue with igb where the driver was re-using
mapped pages so that packets were still getting dropped even if all
the memory issues are gone and there is free memory.

Junwei Zhang found where in the ixgbe_clean_rx_ring() we were repeating
the assignment of NULL to the receive buffer skb and fixes it.

Emil fixes a race condition between setup_link and SFP detection routine
in the watchdog when setting the advertised speed.
====================

Signed-off-by: David S. Miller <[email protected]>

ipv4: Do not cache routing failures due to disabled forwarding.

If we cache them, the kernel will reuse them, independently of
whether forwarding is enabled or not. Which means that if forwarding is
disabled on the input interface where the first routing request comes
from, then that unreachable result will be cached and reused for
other interfaces, even if forwarding is enabled on them. The opposite
is also true.

This can be verified with two interfaces A and B and an output interface
C, where B has forwarding enabled, but not A and trying
ip route get $dst iif A from $src && ip route get $dst iif B from $src

Signed-off-by: Nicolas Cavallari <[email protected]>
Reviewed-by: Julian Anastasov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

fs: allow open(dir, O_TMPFILE|..., 0) with mode 0

The man page for open(2) indicates that when O_CREAT is specified, the
'mode' argument applies only to future accesses to the file:

Note that this mode applies only to future accesses of the newly
created file; the open() call that creates a read-only file
may well return a read/write file descriptor.

The man page for open(2) implies that 'mode' is treated identically by
O_CREAT and O_TMPFILE.

O_TMPFILE, however, behaves differently:

int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);
assert(fd == -1);
assert(errno == EACCES);

int fd = open("/tmp", O_TMPFILE | O_RDWR, 0600);
assert(fd > 0);

For O_CREAT, do_last() sets acc_mode to MAY_OPEN only:

if (*opened & FILE_CREATED) {
/* Don't check for write permission, don't truncate */
open_flag &= ~O_TRUNC;
will_truncate = false;
acc_mode = MAY_OPEN;
path_to_nameidata(path, nd);
goto finish_open_created;
}

But for O_TMPFILE, do_tmpfile() passes the full op->acc_mode to
may_open().

This patch lines up the behavior of O_TMPFILE with O_CREAT. After the
inode is created, may_open() is called with acc_mode = MAY_OPEN, in
do_tmpfile().

A different, but related glibc bug revealed the discrepancy:
https://sourceware.org/bugzilla/show_bug.cgi?id=17523

The glibc lazily loads the 'mode' argument of open() and openat() using
va_arg() only if O_CREAT is present in 'flags' (to support both the 2
argument and the 3 argument forms of open; same idea for openat()).
However, the glibc ignores the 'mode' argument if O_TMPFILE is in
'flags'.

On x86_64, for open(), it magically works anyway, as 'mode' is in
RDX when entering open(), and is still in RDX on SYSCALL, which is where
the kernel looks for the 3rd argument of a syscall.

But openat() is not quite so lucky: 'mode' is in RCX when entering the
glibc wrapper for openat(), while the kernel looks for the 4th argument
of a syscall in R10. Indeed, the syscall calling convention differs from
the regular calling convention in this respect on x86_64. So the kernel
sees mode = 0 when trying to use glibc openat() with O_TMPFILE, and
fails with EACCES.

Signed-off-by: Eric Rannaud <[email protected]>
Acked-by: Andy Lutomirski <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>

Merge branch 'drm-fixes-3.18' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

- dpm stability fixes for SI and KV
- remove an invalid pci id
- kmalloc_array fixes
- minor cleanups

* 'drm-fixes-3.18' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: remove some buggy dead code
  drm/radeon: remove invalid pci id
  drm/radeon: dpm fixes for asrock systems
  radeon: clean up coding style differences in radeon_get_bios()
  drm/radeon: Use drm_malloc_ab instead of kmalloc_array
  drm/radeon/dpm: disable ulv support on SI

Merge tag 'drm-intel-fixes-2014-10-30' of git://anongit.freedesktop.org/drm-intel into drm-fixes

bunch of DP fixes, and some backlight fix.

* tag 'drm-intel-fixes-2014-10-30' of git://anongit.freedesktop.org/drm-intel:
  drm/i915/dp: only use training pattern 3 on platforms that support it
  drm/i915: Ignore VBT backlight check on Macbook 2, 1
  drm/i915: Fix GMBUSFREQ on vlv/chv
  drm/i915: Ignore long hpds on eDP ports
  drm/i915: Do a dummy DPCD read before the actual read

cxgb4 : Fix missing initialization of win0_lock

win0_lock was being used un-initialized, resulting in warning traces
being seen when lock debugging is enabled (and just wrong)

Fixes : fc5ab0209650 ('cxgb4: Replaced the backdoor mechanism to access the HW
memory with PCIe Window method')

Signed-off-by: Anish Bhatt <[email protected]>
Signed-off-by: Casey Leedom <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'r8152-net'

Hayes Wang says:

====================
r8152: patches for autosuspend

There are unexpected processes when enabling autosuspend.
These patches are used to fix them.
====================

Signed-off-by: David S. Miller <[email protected]>

r8152: check WORK_ENABLE in suspend function

Avoid unnecessary behavior when autosuspend occurs during open().
The relative processes should only be run after finishing open().

Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8152: reset tp->speed before autoresuming in open function

If (tp->speed & LINK_STATUS) is not zero, the rtl8152_resume()
would call rtl_start_rx() before enabling the tx/rx. Avoid this
by resetting it to zero.

Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

r8152: clear SELECTIVE_SUSPEND when autoresuming

The flag of SELECTIVE_SUSPEND should be cleared when autoresuming.
Otherwise, when the system suspend and resume occur, it may have
the wrong flow.

Besides, because the flag of SELECTIVE_SUSPEND couldn't be used
to check if the hw enables the relative feature, it should alwayes
be disabled in close().

Signed-off-by: Hayes Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

rtlwifi: rtl8192se: Fix firmware loading

An error in the code makes the allocated space for firmware to be too
small.

Signed-off-by: Larry Finger <[email protected]>
Cc: Murilo Opsfelder Araujo <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

rtlwifi: rtl8192ce: Add missing section to read descriptor setting

The new version of rtlwifi needs code in rtl92ce_get_desc() that returns
the buffer address for read operations.

Signed-off-by: Larry Finger <[email protected]>
Cc: Murilo Opsfelder Araujo <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

rtlwifi: rtl8192se: Add missing section to read descriptor setting

The new version of rtlwifi needs code in rtl92se_get_desc() that returns
the buffer address for read operations.

Signed-off-by: Larry Finger <[email protected]>
Cc: Murilo Opsfelder Araujo <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

rtlwifi: rtl8192se: Fix duplicate calls to ieee80211_register_hw()

Driver rtlwifi has been modified to call ieee80211_register_hw()
from the probe routine; however, the existing call in the callback
routine for deferred firmware loading was not removed.

Signed-off-by: Larry Finger <[email protected]>
Cc: Murilo Opsfelder Araujo <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

rtlwifi: rtl8192ce: rtl8192de: rtl8192se: Fix handling for missing get_btc_status

The recent changes in checking for Bluetooth status added some callbacks to code
in rtlwifi. To make certain that all callbacks are defined, a dummy routine has been
added to rtlwifi, and the drivers that need to use it are modified.

Signed-off-by: Larry Finger <[email protected]>
Cc: Murilo Opsfelder Araujo <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

mwifiex: restart rxreorder timer correctly

During 11n RX reordering, if there is a hole in RX table,
driver will not send packets to kernel until the rxreorder
timer expires or the table is full.
However, currently driver always restarts rxreorder timer when
receiving a packet, which causes the timer hardly to expire.
So while connected with to 11n AP in a busy environment,
ping packets may get blocked for about 30 seconds.

This patch fixes this timer restarting by ensuring rxreorder timer
would only be restarted either timer is not set or start_win
has changed.

Signed-off-by: Chin-Ran Lo <[email protected]>
Signed-off-by: Plus Chen <[email protected]>
Signed-off-by: Marc Yang <[email protected]>
Signed-off-by: Cathy Luo <[email protected]>
Signed-off-by: Avinash Patil <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

ath9k: fix some debugfs output

The right shift operation has higher precedence than the mask so we
left shift by "(i * 3)" and then immediately right shift by "(i * 3)"
then we mask. It should be left shift, mask, and then right shift.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

wireless: rt2x00: add new rt2800usb device

0x1b75 0xa200 AirLive WN-200USB wireless 11b/g/n dongle

References: https://bugs.debian.org/766802
Reported-by: Martin Mokrejs <[email protected]>
Cc: [email protected]
Signed-off-by: Cyril Brulebois <[email protected]>
Acked-by: Stanislaw Gruszka <[email protected]>
Signed-off-by: John W. Linville <[email protected]>

PCI: Rename sysfs 'enabled' file back to 'enable'

Back in commit 5136b2da770d ("PCI: convert bus code to use dev_groups"),
I misstyped the 'enable' sysfs filename as 'enabled', which broke the
userspace API. This patch fixes that issue by renaming the file back.

Fixes: 5136b2da770d ("PCI: convert bus code to use dev_groups")
Reported-by: Jeff Epler <[email protected]>
Tested-by: Jeff Epler <[email protected]> # on v3.14-rt
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Bjorn Helgaas <[email protected]>
CC: [email protected] # 3.13

Merge tag 'fbdev-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux

Pull fbdev fixes from Tomi Valkeinen:

- fix fb console option parsing

- fixes for OMAPDSS/OMAPFB crashes related to module unloading and
   device/driver binding & unbinding.

- fix for OMAP HDMI PLL locking failing in certain cases

- misc minor fixes for atmel lcdfb and OMAP

* tag 'fbdev-fixes-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  omap: dss: connector-analog-tv: Add missing module device table
  OMAPDSS: DSI: Fix PLL_SELFEQDCO field width
  OMAPDSS: fix dispc register dump for preload & mflag
  OMAPDSS: DISPC: fix mflag offset
  OMAPDSS: HDMI: fix regsd write
  OMAPDSS: HDMI: fix PLL GO bit handling
  OMAPFB: fix releasing overlays
  OMAPFB: fix overlay disable when freeing resources.
  OMAPDSS: apply: wait pending updates on manager disable
  OMAPFB: remove __exit annotation
  OMAPDSS: set suppress_bind_attrs
  OMAPFB: add missing MODULE_ALIAS()
  drivers: video: fbdev: atmel_lcdfb.c: remove unnecessary header
  video/console: Resolve several shadow warnings
  fbcon: Fix option parsing control flow in fb_console_setup

Merge tag 'sound-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Although the diffstat looks scary, it's just because of the removal of
  the dead code (s6000), thus it must not affect anything serious.

  Other than that, all small fixes.  The only core fix is zero-clear for
  a PCM compat ioctl.  The rest are driver-specific, bebob, sgtl500,
  adau1761, intel-sst, ad1889 and a few HD-audio quirks as usual"

* tag 'sound-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add workaround for CMI8888 snoop behavior
  ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat mode
  ALSA: bebob: Uninitialized id returned by saffirepro_both_clk_src_get
  ALSA: hda/realtek - New SSID for Headset quirk
  ALSA: ad1889: Fix probable mask then right shift defects
  ALSA: bebob: fix wrong decoding of clock information for Terratec PHASE 88 Rack FW
  ALSA: hda/realtek - Update restore default value for ALC283
  ALSA: hda/realtek - Update restore default value for ALC282
  ASoC: fsl: use strncpy() to prevent copying of over-long names
  ASoC: adau1761: Fix input PGA volume
  ASoC: s6000: remove driver
  ASoC: Intel: HSW/BDW only support S16 and S24 formats.
  ASoC: sgtl500: Document the required supplies

MAINTAINERS: drop list entry for davinci

As [email protected] is now
shut and no more maintained by TI, drop this entry from
DAVINCI MACHINE SUPPORT and DAVINCI SERIES MEDIA DRIVER.

Signed-off-by: Lad, Prabhakar <[email protected]>
Cc: Kevin Hilman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Acked-by: Sekhar Nori <[email protected]>
Signed-off-by: Kevin Hilman <[email protected]>

ext4: make ext4_ext_convert_to_initialized() return proper number of blocks

ext4_ext_convert_to_initialized() can return more blocks than are
actually allocated from map->m_lblk in case where initial part of the
on-disk extent is zeroed out. Luckily this doesn't have serious
consequences because the caller currently uses the return value
only to unmap metadata buffers. Anyway this is a data
corruption/exposure problem waiting to happen so fix it.

Coverity-id: 1226848
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: bail early when clearing inode journal flag fails

When clearing inode journal flag, we call jbd2_journal_flush() to force
all the journalled data to their final locations. Currently we ignore
when this fails and continue clearing inode journal flag. This isn't a
big problem because when jbd2_journal_flush() fails, journal is likely
aborted anyway. But it can still lead to somewhat confusing results so
rather bail out early.

Coverity-id: 989044
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: bail out from make_indexed_dir() on first error

When ext4_handle_dirty_dx_node() or ext4_handle_dirty_dirent_node()
fail, there's really something wrong with the fs and there's no point in
continuing further. Just return error from make_indexed_dir() in that
case. Also initialize frames array so that if we return early due to
error, dx_release() doesn't try to dereference uninitialized memory
(which could happen also due to error in do_split()).

Coverity-id: 741300
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]

jbd2: use a better hash function for the revoke table

The old hash function didn't work well for 64-bit block numbers, and
used undefined (negative) shift right behavior. Use the generic
64-bit hash function instead.

Signed-off-by: Theodore Ts'o <[email protected]>
Reported-by: Andrey Ryabinin <[email protected]>

ext4: prevent bugon on race between write/fcntl

O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked
twice inside ext4_file_write_iter() and __generic_file_write() which
result in BUG_ON inside ext4_direct_IO.

Let's initialize iocb->private unconditionally.

TESTCASE: xfstest:generic/036  https://patchwork.ozlabs.org/patch/402445/

#TYPICAL STACK TRACE:
kernel BUG at fs/ext4/inode.c:2960!
invalid opcode: 0000 [#1] SMP
Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod
CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 #161
Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011
task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000
RIP: 0010:[<ffffffff811fabf2>]  [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
RSP: 0018:ffff88080f90bb58  EFLAGS: 00010246
RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818
RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001
RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80
R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28
FS:  00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0
Stack:
ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30
0000000000000200 0000000000000200 0000000000000001 0000000000000200
ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200
Call Trace:
[<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180
[<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370
[<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400
[<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
[<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410
[<ffffffff810990e5>] ? local_clock+0x25/0x30
[<ffffffff810abd94>] ? __lock_acquire+0x274/0x700
[<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0
[<ffffffff811bd756>] aio_run_iocb+0x286/0x410
[<ffffffff810990e5>] ? local_clock+0x25/0x30
[<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190
[<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0
[<ffffffff811bde3b>] do_io_submit+0x55b/0x740
[<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740
[<ffffffff811be030>] SyS_io_submit+0x10/0x20
[<ffffffff815ce192>] system_call_fastpath+0x16/0x1b
Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c
24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89
RIP  [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0
RSP <ffff88080f90bb58>

Reported-by: Sasha Levin <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Signed-off-by: Dmitry Monakhov <[email protected]>
Cc: [email protected]

ext4: remove extent status procfs files if journal load fails

If we can't load the journal, remove the procfs files for the extent
status information file to avoid leaking resources.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]

ext4: disallow changing journal_csum option during remount

ext4 does not permit changing the metadata or journal checksum feature
flag while mounted. Until we decide to support that, don't allow a
remount to change the journal_csum flag (right now we silently fail to
change anything).

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: enable journal checksum when metadata checksum feature enabled

If metadata checksumming is turned on for the FS, we need to tell the
journal to use checksumming too.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Cc: [email protected]

ext4: fix oops when loading block bitmap failed

When we fail to load block bitmap in __ext4_new_inode() we will
dereference NULL pointer in ext4_journal_get_write_access(). So check
for error from ext4_read_block_bitmap().

Coverity-id: 989065
Cc: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>

ext4: fix overflow when updating superblock backups after resize

When there are no meta block groups update_backups() will compute the
backup block in 32-bit arithmetics thus possibly overflowing the block
number and corrupting the filesystem. OTOH filesystems without meta
block groups larger than 16 TB should be rare. Fix the problem by doing
the counting in 64-bit arithmetics.

Coverity-id: 741252
CC: [email protected]
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Reviewed-by: Lukas Czerner <[email protected]>

drm/i915/dp: only use training pattern 3 on platforms that support it

Ivybridge + 30" monitor prints a drm error on every modeset, since IVB
doesn't support DP3 we should even bother trying to use it.

This regression has been introduced in

commit 06ea66b6bb445043dc25a9626254d5c130093199
Author: Todd Previte <[email protected]>
Date: Mon Jan 20 10:19:39 2014 -0700

drm/i915: Enable 5.4Ghz (HBR2) link rate for Displayport 1.2-capable
devices

Reported-by: Dave Airlie <[email protected]>
Reference: http://mid.gmane.org/1414566170 [email protected]
Cc: Todd Previte <[email protected]>
Cc: [email protected] (3.15+)
Reviewed-by: Ville Syrjälä <[email protected]>
Signed-off-by: Jani Nikula <[email protected]>

Merge branch '3.18/omapdss-fixes' into 3.18/fbdev-fixes

omap: dss: connector-analog-tv: Add missing module device table

Without that fix connector-analog-tv driver isn't probed when compiled
as module.

Signed-off-by: H. Nikolaus Schaller <[email protected]>
Signed-off-by: Tomi Valkeinen <[email protected]>

ixgbe: fix race when setting advertised speed

Following commands:

modprobe ixgbe
ifconfig ethX up
ethtool -s ethX advertise 0x020

can lead to "setup link failed with code -14" error due to the setup_link
call racing with the SFP detection routine in the watchdog.

This patch resolves this issue by protecting the setup_link call with check
for __IXGBE_IN_SFP_INIT.

Reported-by: Scott Harrison <[email protected]>
Signed-off-by: Emil Tantilov <[email protected]>
Tested-by: Phil Schmitt <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

ixgbe: need not repeat init skb with NULL

Signed-off-by: Martin Zhang <[email protected]>
Tested-by: Phil Schmitt <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

igb: don't reuse pages with pfmemalloc flag

Incoming packet is dropped silently by sk_filter(), if the skb was
allocated from pfmemalloc reserves and the corresponding socket is
not marked with the SOCK_MEMALLOC flag.

Igb driver allocates pages for DMA with __skb_alloc_page(), which
calls alloc_pages_node() with the __GFP_MEMALLOC flag. So, in case
of OOM condition, igb can get pages with pfmemalloc flag set.

If an incoming packet hits the pfmemalloc page and is large enough
(small packets are copying into the memory, allocated with
netdev_alloc_skb_ip_align(), so they are not affected), it will be
dropped.

This behavior is ok under high memory pressure, but the problem is
that the igb driver reuses these mapped pages. So, packets are still
dropping even if all memory issues are gone and there is a plenty
of free memory.

In my case, some TCP sessions hang on a small percentage (< 0.1%)
of machines days after OOMs.

Fix this by avoiding reuse of such pages.

Signed-off-by: Roman Gushchin <[email protected]>
Tested-by: Aaron Brown "[email protected]"
Signed-off-by: Jeff Kirsher <[email protected]>

e1000: unset IFF_UNICAST_FLT on WMware 82545EM

VMWare's e1000 implementation does not seem to support unicast filtering.
This can be observed by configuring a macvlan interface on eth0 in a VM in
VMWare Fusion 5.0.5, and trying to use that interface instead of eth0.
Tested on 3.16.

Signed-off-by: Francesco Ruggeri <[email protected]>
Tested-by: Aaron Brown <[email protected]>
Signed-off-by: Jeff Kirsher <[email protected]>

rbd: Fix error recovery in rbd_obj_read_sync()

When we fail to allocate page vector in rbd_obj_read_sync() we just
basically ignore the problem and continue which will result in an oops
later. Fix the problem by returning proper error.

CC: Yehuda Sadeh <[email protected]>
CC: Sage Weil <[email protected]>
CC: [email protected]
CC: [email protected]
Coverity-id: 1226882
Signed-off-by: Jan Kara <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>

libceph: use memalloc flags for net IO

This patch has ceph's lib code use the memalloc flags.

If the VM layer needs to write data out to free up memory to handle new
allocation requests, the block layer must be able to make forward progress.
To handle that requirement we use structs like mempools to reserve memory for
objects like bios and requests.

The problem is when we send/receive block layer requests over the network
layer, net skb allocations can fail and the system can lock up.
To solve this, the memalloc related flags were added. NBD, iSCSI
and NFS uses these flags to tell the network/vm layer that it should
use memory reserves to fullfill allcation requests for structs like
skbs.

I am running ceph in a bunch of VMs in my laptop, so this patch was
not tested very harshly.

Signed-off-by: Mike Christie <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>

rbd: use a single workqueue for all devices

Using one queue per device doesn't make much sense given that our
workfn processes "devices" and not "requests". Switch to a single
workqueue for all devices.

Signed-off-by: Ilya Dryomov <[email protected]>
Reviewed-by: Sage Weil <[email protected]>

Merge branch 'urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent

Pull two RCU fixes from Paul E. McKenney:

" - Complete the work of commit dd56af42bd82 (rcu: Eliminate deadlock
    between CPU hotplug and expedited grace periods), which was
    intended to allow synchronize_sched_expedited() to be safely
    used when holding locks acquired by CPU-hotplug notifiers.
    This commit makes the put_online_cpus() avoid the deadlock
    instead of just handling the get_online_cpus().

  - Complete the work of commit 35ce7f29a44a (rcu: Create rcuo
    kthreads only for onlined CPUs), which was intended to allow
    RCU to avoid allocating unneeded kthreads on systems where the
    firmware says that there are more CPUs than are really present.
    This commit makes rcu_barrier() aware of the mismatch, so that
    it doesn't hang waiting for non-existent CPUs. "

Signed-off-by: Ingo Molnar <[email protected]>

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

- Fix report -F (abort, in_tx, mispredict, etc) segfaults for sample.data files
   without branch info (Jiri Olsa)

- Add patch that should have went in a previous patchkit to use global cache
   provided by libunwind (Namhyung Kim)

- Make CPUINFO_PROC an array to support different kernels, problem
   detected when the information reported via /proc/cpuinfo changed on ARM (Wang Nan)

- 'perf probe' --demangle typo fix and a new --quiet option (Masami Hiramatsu)

Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>

powerpc/fadump: Fix endianess issues in firmware assisted dump handling

Firmware-assisted dump (fadump) kernel code is not endian safe. The
below patch fixes this issue. Tested this patch with upstream kernel.
Below output shows crash tool successfully opening LE fadump vmcore.

    # crash vmlinux vmcore
    GNU gdb (GDB) 7.6
    This GDB was configured as "powerpc64le-unknown-linux-gnu"...

          KERNEL: vmlinux
        DUMPFILE: vmcore
     CPUS: 16
     DATE: Wed Dec 31 19:00:00 1969
          UPTIME: 00:03:28
    LOAD AVERAGE: 0.46, 0.86, 0.41
           TASKS: 268
        NODENAME: linux-dhr2
         RELEASE: 3.17.0-rc5-7-default
         VERSION: #6 SMP Tue Sep 30 01:06:34 EDT 2014
         MACHINE: ppc64le  (4116 Mhz)
          MEMORY: 40 GB
           PANIC: "Oops: Kernel access of bad area, sig: 11 [#1]" (check log for details)
     PID: 6223
         COMMAND: "bash"
     TASK: c0000009661b2500  [THREAD_INFO: c000000967ac0000]
     CPU: 2
           STATE: TASK_RUNNING (PANIC)

Signed-off-by: Hari Bathini <[email protected]>
[mpe: Make the comment in pSeries_lpar_hptab_clear() clearer]
Signed-off-by: Michael Ellerman <[email protected]>

mtd: cfi_cmdset_0001.c: fix resume for LH28F640BF chips

After '#echo mem > /sys/power/state' some devices can not be properly resumed
because apparently the MTD Partition Configuration Register has been reset
to default thus the rootfs cannot be mounted cleanly on resume.
An example of this can be found in the SA-1100 Developer's Manual at 9.5.3.3
where the second step of the Sleep Shutdown Sequence is described:
"An internal reset is applied to the SA-1100. All units are reset...".

As workaround we refresh the PCR value as done initially on chip setup.

This behavior and the fix are confirmed by our tests done on 2 different Zaurus
collie units with kernel 3.17.

Fixes: 812c5fa82bae: ("mtd: cfi_cmdset_0001.c: add support for Sharp LH28F640BF NOR")
Signed-off-by: Dmitry Eremin-Solenikov <[email protected]>
Signed-off-by: Andrea Adami <[email protected]>
Cc: <[email protected]> # 3.16+
Signed-off-by: Brian Norris <[email protected]>

mtd: omap: fix mtd devices not showing up

Since commit 6d178ef2fd5e ("mtd: nand: Move ELM driver and rename as
omap_elm"), I don't have any mtd devices present on my am335x. This
changes the link order of the omap_elm and omap2 objects, causing them
to probe in the wrong order.

To fix this, make elm_config defer probing until the omap_elm driver is
actually loaded.

Signed-off-by: Frans Klaver <[email protected]>
Acked-by: Roger Quadros <[email protected]>
Signed-off-by: Brian Norris <[email protected]>

powerpc: Fix section mismatch warning

Add __init to MMU_setup() which uses __initdata boot_command_line.
Also MMU_setup() is only called from MMU_init(), which is also __init.

Warning appeared since commit 3e47d1474c2b.

Fixes: 3e47d1474c2b ("powerpc: Remove powerpc specific cmd_line")
Signed-off-by: Fabian Frederick <[email protected]>
[mpe: Update changelog]
Signed-off-by: Michael Ellerman <[email protected]>

Merge branch 'akpm' (incoming from Andrew Morton)

Merge misc fixes from Andrew Morton:
"21 fixes"

* emailed patches from Andrew Morton <[email protected]>: (21 commits)
  mm/balloon_compaction: fix deflation when compaction is disabled
  sh: fix sh770x SCIF memory regions
  zram: avoid NULL pointer access in concurrent situation
  mm/slab_common: don't check for duplicate cache names
  ocfs2: fix d_splice_alias() return code checking
  mm: rmap: split out page_remove_file_rmap()
  mm: memcontrol: fix missed end-writeback page accounting
  mm: page-writeback: inline account_page_dirtied() into single caller
  lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
  drivers/rtc/rtc-bq32k.c: fix register value
  memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()
  drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock
  kernel/kmod: fix use-after-free of the sub_info structure
  drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc
  mm, thp: fix collapsing of hugepages on madvise
  drivers: of: add return value to of_reserved_mem_device_init()
  mm: free compound page with correct order
  gcov: add ARM64 to GCOV_PROFILE_ALL
  fsnotify: next_i is freed during fsnotify_unmount_inodes.
  mm/compaction.c: avoid premature range skip in isolate_migratepages_range
  ...

mm/balloon_compaction: fix deflation when compaction is disabled

If CONFIG_BALLOON_COMPACTION=n balloon_page_insert() does not link pages
with balloon and doesn't set PagePrivate flag, as a result
balloon_page_dequeue() cannot get any pages because it thinks that all
of them are isolated. Without balloon compaction nobody can isolate
ballooned pages. It's safe to remove this check.

Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management").
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Reported-by: Matt Mullins <[email protected]>
Cc: <[email protected]> [3.17]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

sh: fix sh770x SCIF memory regions

Resources scif1_resources & scif2_resources overlap. Actual SCIF region
size is 0x10.

This is regression from commit d850acf975be ("sh: Declare SCIF register
base and IRQ as resources")

Signed-off-by: Andriy Skulysh <[email protected]>
Acked-by: Laurent Pinchart <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

zram: avoid NULL pointer access in concurrent situation

There is a rare NULL pointer bug in mem_used_total_show() and
mem_used_max_store() in concurrent situation, like this:

zram is not initialized, process A is a mem_used_total reader which runs
periodically, while process B try to init zram.

process A process B
  access meta, get a NULL value
init zram, done
  init_done() is true
  access meta->mem_pool, get a NULL pointer BUG

This patch fixes this issue.

Signed-off-by: Weijie Yang <[email protected]>
Acked-by: Minchan Kim <[email protected]>
Acked-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/slab_common: don't check for duplicate cache names

The SLUB cache merges caches with the same size and alignment and there
was long standing bug with this behavior:

- create the cache named "foo"
- create the cache named "bar" (which is merged with "foo")
- delete the cache named "foo" (but it stays allocated because "bar"
   uses it)
- create the cache named "foo" again - it fails because the name "foo"
   is already used

That bug was fixed in commit 694617474e33 ("slab_common: fix the check
for duplicate slab names") by not warning on duplicate cache names when
the SLUB subsystem is used.

Recently, cache merging was implemented the with SLAB subsystem too, in
12220dea07f1 ("mm/slab: support slab merge")).  Therefore we need stop
checking for duplicate names even for the SLAB subsystem.

This patch fixes the bug by removing the check.

Signed-off-by: Mikulas Patocka <[email protected]>
Acked-by: Christoph Lameter <[email protected]>
Cc: Pekka Enberg <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

ocfs2: fix d_splice_alias() return code checking

d_splice_alias() can return a valid dentry, NULL or an ERR_PTR.
Currently the code checks not for ERR_PTR and will cuase an oops in
ocfs2_dentry_attach_lock(). Fix this by using IS_ERR_OR_NULL().

Signed-off-by: Richard Weinberger <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: rmap: split out page_remove_file_rmap()

page_remove_rmap() has too many branches on PageAnon() and is hard to
follow. Move the file part into a separate function.

Signed-off-by: Johannes Weiner <[email protected]>
Reviewed-by: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: memcontrol: fix missed end-writeback page accounting

Commit 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") changed
page migration to uncharge the old page right away.  The page is locked,
unmapped, truncated, and off the LRU, but it could race with writeback
ending, which then doesn't unaccount the page properly:

test_clear_page_writeback()              migration
                                           wait_on_page_writeback()
  TestClearPageWriteback()
                                           mem_cgroup_migrate()
                                             clear PCG_USED
  mem_cgroup_update_page_stat()
    if (PageCgroupUsed(pc))
      decrease memcg pages under writeback

  release pc->mem_cgroup->move_lock

The per-page statistics interface is heavily optimized to avoid a
function call and a lookup_page_cgroup() in the file unmap fast path,
which means it doesn't verify whether a page is still charged before
clearing PageWriteback() and it has to do it in the stat update later.

Rework it so that it looks up the page's memcg once at the beginning of
the transaction and then uses it throughout.  The charge will be
verified before clearing PageWriteback() and migration can't uncharge
the page as long as that is still set.  The RCU lock will protect the
memcg past uncharge.

As far as losing the optimization goes, the following test results are
from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
three times in a nested fashion, so that there are two negative passes
that don't account but still go through the new transaction overhead.
There is no actual difference:

old:     33.195102545 seconds time elapsed       ( +-  0.01% )
new:     33.199231369 seconds time elapsed       ( +-  0.03% )

The time spent in page_remove_rmap()'s callees still adds up to the
same, but the time spent in the function itself seems reduced:

     # Children      Self  Command        Shared Object       Symbol
old:     0.12%     0.11%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap
new:     0.12%     0.08%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap

Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: <[email protected]> [3.17.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: page-writeback: inline account_page_dirtied() into single caller

A follow-up patch would have changed the call signature. To save the
trouble, just fold it instead.

Signed-off-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: <[email protected]> [3.17.x]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()

If __bitmap_shift_left() or __bitmap_shift_right() are asked to shift by
a multiple of BITS_PER_LONG, they will try to shift a long value by
BITS_PER_LONG bits which is undefined. Change the functions to avoid
the undefined shift.

Coverity id: 1192175
Coverity id: 1192174
Signed-off-by: Jan Kara <[email protected]>
Cc: Rasmus Villemoes <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/rtc/rtc-bq32k.c: fix register value

Fix register value in bq32000 trickle charging.

Mike reported that I'm using wrong value in one trickle-charging case,
and after checking docs, I must admit he's right.

Signed-off-by: Pavel Machek <[email protected]>
Reported-by: Mike Bremford <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

memory-hotplug: clear pgdat which is allocated by bootmem in try_offline_node()

When hot adding the same memory after hot removal, the following
messages are shown:

  WARNING: CPU: 20 PID: 6 at mm/page_alloc.c:4968 free_area_init_node+0x3fe/0x426()
  ...
  Call Trace:
    dump_stack+0x46/0x58
    warn_slowpath_common+0x81/0xa0
    warn_slowpath_null+0x1a/0x20
    free_area_init_node+0x3fe/0x426
    hotadd_new_pgdat+0x90/0x110
    add_memory+0xd4/0x200
    acpi_memory_device_add+0x1aa/0x289
    acpi_bus_attach+0xfd/0x204
    acpi_bus_attach+0x178/0x204
    acpi_bus_scan+0x6a/0x90
    acpi_device_hotplug+0xe8/0x418
    acpi_hotplug_work_fn+0x1f/0x2b
    process_one_work+0x14e/0x3f0
    worker_thread+0x11b/0x510
    kthread+0xe1/0x100
    ret_from_fork+0x7c/0xb0

The detaled explanation is as follows:

When hot removing memory, pgdat is set to 0 in try_offline_node().  But
if the pgdat is allocated by bootmem allocator, the clearing step is
skipped.

And when hot adding the same memory, the uninitialized pgdat is reused.
But free_area_init_node() checks wether pgdat is set to zero.  As a
result, free_area_init_node() hits WARN_ON().

This patch clears pgdat which is allocated by bootmem allocator in
try_offline_node().

Signed-off-by: Yasuaki Ishimatsu <[email protected]>
Cc: Zhang Zhen <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Tang Chen <[email protected]>
Reviewed-by: Toshi Kani <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock

Fix unconditional initialization failure on non-exynos3250 SoCs.

Commit df9e26d093d3 ("rtc: s3c: add support for RTC of Exynos3250 SoC")
introduced rtc source clock support, but also added initialization
failure on SoCs, which doesn't need such clock.

Signed-off-by: Marek Szyprowski <[email protected]>
Reviewed-by: Chanwoo Choi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

kernel/kmod: fix use-after-free of the sub_info structure

Found this in the message log on a s390 system:

    BUG kmalloc-192 (Not tainted): Poison overwritten
    Disabling lock debugging due to kernel taint
    INFO: 0x00000000684761f4-0x00000000684761f7. First byte 0xff instead of 0x6b
    INFO: Allocated in call_usermodehelper_setup+0x70/0x128 age=71 cpu=2 pid=648
     __slab_alloc.isra.47.constprop.56+0x5f6/0x658
     kmem_cache_alloc_trace+0x106/0x408
     call_usermodehelper_setup+0x70/0x128
     call_usermodehelper+0x62/0x90
     cgroup_release_agent+0x178/0x1c0
     process_one_work+0x36e/0x680
     worker_thread+0x2f0/0x4f8
     kthread+0x10a/0x120
     kernel_thread_starter+0x6/0xc
     kernel_thread_starter+0x0/0xc
    INFO: Freed in call_usermodehelper_exec+0x110/0x1b8 age=71 cpu=2 pid=648
     __slab_free+0x94/0x560
     kfree+0x364/0x3e0
     call_usermodehelper_exec+0x110/0x1b8
     cgroup_release_agent+0x178/0x1c0
     process_one_work+0x36e/0x680
     worker_thread+0x2f0/0x4f8
     kthread+0x10a/0x120
     kernel_thread_starter+0x6/0xc
     kernel_thread_starter+0x0/0xc

There is a use-after-free bug on the subprocess_info structure allocated
by the user mode helper.  In case do_execve() returns with an error
____call_usermodehelper() stores the error code to sub_info->retval, but
sub_info can already have been freed.

Regarding UMH_NO_WAIT, the sub_info structure can be freed by
__call_usermodehelper() before the worker thread returns from
do_execve(), allowing memory corruption when do_execve() failed after
exec_mmap() is called.

Regarding UMH_WAIT_EXEC, the call to umh_complete() allows
call_usermodehelper_exec() to continue which then frees sub_info.

To fix this race the code needs to make sure that the call to
call_usermodehelper_freeinfo() is always done after the last store to
sub_info->retval.

Signed-off-by: Martin Schwidefsky <[email protected]>
Reviewed-by: Oleg Nesterov <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers/rtc/rtc-pm8xxx.c: rework to support pm8941 rtc

Adds support for RTC device inside PM8941 PMIC. The RTC in this PMIC
have two register spaces. Thus the rtc-pm8xxx is slightly reworked to
reflect these differences.

The register set for different PMIC chips are selected on DT compatible
string base.

[[email protected]: coding-style fixes]
[[email protected]: simplify and fix locking in pm8xxx_rtc_set_time()]
Signed-off-by: Stanimir Varbanov <[email protected]>
Cc: Alessandro Zummo <[email protected]>
Cc: Stephen Boyd <[email protected]>
Cc: Josh Cartwright <[email protected]>
Cc: Stanimir Varbanov <[email protected]>
Cc: Dan Carpenter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm, thp: fix collapsing of hugepages on madvise

If an anonymous mapping is not allowed to fault thp memory and then
madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
collapse this memory into thp memory.

This occurs because the madvise(2) handler for thp, hugepage_madvise(),
clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
until the final action of madvise_behavior().  This causes the
khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
the vma had previously had VM_NOHUGEPAGE set.

Fix this by passing the correct vma flags to the khugepaged mm slot
handler.  There's no chance khugepaged can run on this vma until after
madvise_behavior() returns since we hold mm->mmap_sem.

It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
in hugepage_advise(), but I didn't want to introduce special case
behavior into madvise_behavior().  I think it's best to just let it
always set vma->vm_flags itself.

Signed-off-by: David Rientjes <[email protected]>
Reported-by: Suleiman Souhlal <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

drivers: of: add return value to of_reserved_mem_device_init()

Driver calling of_reserved_mem_device_init() might be interested if the
initialization has been successful or not, so add support for returning
error code.

This fixes a build warining caused by commit 7bfa5ab6fa1b ("drivers:
dma-coherent: add initialization from device tree"), which has been
merged without this change and without fixing function return value.

Fixes: 7bfa5ab6fa1b1 ("drivers: dma-coherent: add initialization from device tree")
Signed-off-by: Marek Szyprowski <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Grant Likely <[email protected]>
Cc: Laura Abbott <[email protected]>
Cc: Josh Cartwright <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Russell King <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm: free compound page with correct order

Compound page should be freed by put_page() or free_pages() with correct
order.  Not doing so will cause tail pages leaked.

The compound order can be obtained by compound_order() or use
HPAGE_PMD_ORDER in our case.  Some people would argue the latter is
faster but I prefer the former which is more general.

This bug was observed not just on our servers (the worst case we saw is
11G leaked on a 48G machine) but also on our workstations running Ubuntu
based distro.

  $ cat /proc/vmstat  | grep thp_zero_page_alloc
  thp_zero_page_alloc 55
  thp_zero_page_alloc_failed 0

This means there is (thp_zero_page_alloc - 1) * (2M - 4K) memory leaked.

Fixes: 97ae17497e99 ("thp: implement refcounting for huge zero page")
Signed-off-by: Yu Zhao <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Bob Liu <[email protected]>
Cc: <[email protected]> [3.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

gcov: add ARM64 to GCOV_PROFILE_ALL

Following up the arm testing of gcov, turns out gcov on ARM64 works fine
as well. Only change needed is adding ARM64 to Kconfig depends.

Tested with qemu and mach-virt

Signed-off-by: Riku Voipio <[email protected]>
Acked-by: Peter Oberparleiter <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

fsnotify: next_i is freed during fsnotify_unmount_inodes.

During file system stress testing on 3.10 and 3.12 based kernels, the
umount command occasionally hung in fsnotify_unmount_inodes in the
section of code:

                spin_lock(&inode->i_lock);
                if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) {
                        spin_unlock(&inode->i_lock);
                        continue;
                }

As this section of code holds the global inode_sb_list_lock, eventually
the system hangs trying to acquire the lock.

Multiple crash dumps showed:

The inode->i_state == 0x60 and i_count == 0 and i_sb_list would point
back at itself.  As this is not the value of list upon entry to the
function, the kernel never exits the loop.

To help narrow down problem, the call to list_del_init in
inode_sb_list_del was changed to list_del.  This poisons the pointers in
the i_sb_list and causes a kernel to panic if it transverse a freed
inode.

Subsequent stress testing paniced in fsnotify_unmount_inodes at the
bottom of the list_for_each_entry_safe loop showing next_i had become
free.

We believe the root cause of the problem is that next_i is being freed
during the window of time that the list_for_each_entry_safe loop
temporarily releases inode_sb_list_lock to call fsnotify and
fsnotify_inode_delete.

The code in fsnotify_unmount_inodes attempts to prevent the freeing of
inode and next_i by calling __iget.  However, the code doesn't do the
__iget call on next_i

if i_count == 0 or
if i_state & (I_FREEING | I_WILL_FREE)

The patch addresses this issue by advancing next_i in the above two cases
until we either find a next_i which we can __iget or we reach the end of
the list.  This makes the handling of next_i more closely match the
handling of the variable "inode."

The time to reproduce the hang is highly variable (from hours to days.) We
ran the stress test on a 3.10 kernel with the proposed patch for a week
without failure.

During list_for_each_entry_safe, next_i is becoming free causing
the loop to never terminate.  Advance next_i in those cases where
__iget is not done.

Signed-off-by: Jerry Hoemann <[email protected]>
Cc: Jeff Kirsher <[email protected]>
Cc: Ken Helias <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

mm/compaction.c: avoid premature range skip in isolate_migratepages_range

Commit edc2ca612496 ("mm, compaction: move pageblock checks up from
isolate_migratepages_range()") commonizes isolate_migratepages variants
and make them use isolate_migratepages_block().

isolate_migratepages_block() could stop the execution when enough pages
are isolated, but, there is no code in isolate_migratepages_range() to
handle this case. In the result, even if isolate_migratepages_block()
returns prematurely without checking all pages in the range,

isolate_migratepages_block() is called repeately on the following
pageblock and some pages in the previous range are skipped to check.
Then, CMA is failed frequently due to this fact.

To fix this problem, this patch let isolate_migratepages_range() know
the situation that enough pages are isolated and stop the isolation in
that case.

Note that isolate_migratepages() has no such problem, because, it always
stops the isolation after just one call of isolate_migratepages_block().

Signed-off-by: Joonsoo Kim <[email protected]>
Acked-by: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Michal Nazarewicz <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Zhang Yanfei <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.

Commit ff7ee93f4715 ("cgroup/kmemleak: Annotate alloc_page() for cgroup
allocations") introduces kmemleak_alloc() for alloc_page_cgroup(), but
corresponding kmemleak_free() is missing, which makes kmemleak be
wrongly disabled after memory offlining.  Log is pasted at the end of
this commit message.

This patch add kmemleak_free() into free_page_cgroup().  During page
offlining, this patch removes corresponding entries in kmemleak rbtree.
After that, the freed memory can be allocated again by other subsystems
without killing kmemleak.

  bash # for x in 1 2 3 4; do echo offline > /sys/devices/system/memory/memory$x/state ; sleep 1; done ; dmesg | grep leak

  Offlined Pages 32768
  kmemleak: Cannot insert 0xffff880016969000 into the object search tree (overlaps existing)
  CPU: 0 PID: 412 Comm: sleep Not tainted 3.17.0-rc5+ #86
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Call Trace:
    dump_stack+0x46/0x58
    create_object+0x266/0x2c0
    kmemleak_alloc+0x26/0x50
    kmem_cache_alloc+0xd3/0x160
    __sigqueue_alloc+0x49/0xd0
    __send_signal+0xcb/0x410
    send_signal+0x45/0x90
    __group_send_sig_info+0x13/0x20
    do_notify_parent+0x1bb/0x260
    do_exit+0x767/0xa40
    do_group_exit+0x44/0xa0
    SyS_exit_group+0x17/0x20
    system_call_fastpath+0x16/0x1b

  kmemleak: Kernel memory leak detector disabled
  kmemleak: Object 0xffff880016900000 (size 524288):
  kmemleak:   comm "swapper/0", pid 0, jiffies 4294667296
  kmemleak:   min_count = 0
  kmemleak:   count = 0
  kmemleak:   flags = 0x1
  kmemleak:   checksum = 0
  kmemleak:   backtrace:
        log_early+0x63/0x77
        kmemleak_alloc+0x4b/0x50
        init_section_page_cgroup+0x7f/0xf5
        page_cgroup_init+0xc5/0xd0
        start_kernel+0x333/0x408
        x86_64_start_reservations+0x2a/0x2c
        x86_64_start_kernel+0xf5/0xfc

Fixes: ff7ee93f4715 (cgroup/kmemleak: Annotate alloc_page() for cgroup allocations)
Signed-off-by: Wang Nan <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: <[email protected]> [3.2+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

inet: frags: remove the WARN_ON from inet_evict_bucket

The WARN_ON in inet_evict_bucket can be triggered by a valid case:
inet_frag_kill and inet_evict_bucket can be running in parallel on the
same queue which means that there has been at least one more ref added
by a previous inet_frag_find call, but inet_frag_kill can delete the
timer before inet_evict_bucket which will cause the WARN_ON() there to
trigger since we'll have refcnt!=1. Now, this case is valid because the
queue is being "killed" for some reason (removed from the chain list and
its timer deleted) so it will get destroyed in the end by one of the
inet_frag_put() calls which reaches 0 i.e. refcnt is still valid.

CC: Florian Westphal <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: Patrick McLean <[email protected]>
Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
Reported-by: Patrick McLean <[email protected]>
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

inet: frags: fix a race between inet_evict_bucket and inet_frag_kill

When the evictor is running it adds some chosen frags to a local list to
be evicted once the chain lock has been released but at the same time
the *frag_queue can be running for some of the same queues and it
may call inet_frag_kill which will wait on the chain lock and
will then delete the queue from the wrong list since it was added in the
eviction one. The fix is simple - check if the queue has the evict flag
set under the chain lock before deleting it, this is safe because the
evict flag is set only under that lock and having the flag set also means
that the queue has been detached from the chain list, so no need to delete
it again.
An important note to make is that we're safe w.r.t refcnt because
inet_frag_kill and inet_evict_bucket will sync on the del_timer operation
where only one of the two can succeed (or if the timer is executing -
none of them), the cases are:
1. inet_frag_kill succeeds in del_timer
- then the timer ref is removed, but inet_evict_bucket will not add
   this queue to its expire list but will restart eviction in that chain
2. inet_evict_bucket succeeds in del_timer
- then the timer ref is kept until the evictor "expires" the queue, but
   inet_frag_kill will remove the initial ref and will set
   INET_FRAG_COMPLETE which will make the frag_expire fn just to remove
   its ref.
In the end all of the queue users will do an inet_frag_put and the one
that reaches 0 will free it. The refcount balance should be okay.

CC: Florian Westphal <[email protected]>
CC: Eric Dumazet <[email protected]>
CC: Patrick McLean <[email protected]>
Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue")
Suggested-by: Eric Dumazet <[email protected]>
Reported-by: Patrick McLean <[email protected]>
Tested-by: Patrick McLean <[email protected]>
Signed-off-by: Nikolay Aleksandrov <[email protected]>
Reviewed-by: Florian Westphal <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

ARM: OMAP2+: Warn about deprecated legacy booting mode

We're moving omaps to use device tree based booting and already have
omap2, omap4, omap5, am335x and am437x booting in device tree only
mode.

Only omap3 still has legacy booting still around and we really want
to make that device tree only. So let's add a warning about deprecated
legacy booting so we get people to upgrade their boards to use device
tree based booting and find out about any remaining issues.

Note that for most boards we already have the .dts file and those can
be booted with without changing the bootloader using the appended
DTB mode.

Acked-By: Sebastian Reichel <[email protected]>
Reviewed-by: Aaro Koskinen <[email protected]>
Reviewed-by: Javier Martinez Canillas <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>

ARM: omap2plus_defconfig: Fix errors with NAND BCH

Looks like we need to have BCH enabled to get NAND
working and to avoid getting:

nand: error: CONFIG_MTD_NAND_ECC_BCH not enabled

Signed-off-by: Tony Lindgren <[email protected]>

cnic: Update the rcu_access_pointer() usages

1. Remove the rcu_read_lock/unlock around rcu_access_pointer
2. Replace the rcu_dereference with rcu_access_pointer

Signed-off-by: Tej Parkash <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block layer fixes from Jens Axboe:
"A small collection of fixes for the current kernel.  This contains:

   - Two error handling fixes from Jan Kara.  One for null_blk on
     failure to add a device, and the other for the block/scsi_ioctl
     SCSI_IOCTL_SEND_COMMAND fixing up the error jump point.

   - A commit added in the merge window for the bio integrity bits
     unfortunately disabled merging for all requests if
     CONFIG_BLK_DEV_INTEGRITY wasn't set.  Reverse the logic, so that
     integrity checking wont disallow merges when not enabled.

   - A fix from Ming Lei for merging and generating too many segments.
     This caused a BUG in virtio_blk.

   - Two error handling printk() fixups from Robert Elliott, improving
     the information given when we rate limit.

   - Error handling fixup on elevator_init() failure from Sudip
     Mukherjee.

   - A fix from Tony Battersby, fixing up a memory leak in the
     scatterlist handling with scsi-mq"

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: Fix merge logic when CONFIG_BLK_DEV_INTEGRITY is not defined
  lib/scatterlist: fix memory leak with scsi-mq
  block: fix wrong error return in elevator_init()
  scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
  null_blk: Cleanup error recovery in null_add_dev()
  blk-merge: recaculate segment if it isn't less than max segments
  fs: clarify rate limit suppressed buffer I/O errors
  fs: merge I/O error prints into one line

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid

Pull HID fixes from Jiri Kosina:
- workarounds for a couple of misbehaving Elan Touchscreens, by Adel
   Gadllah
- fix for TransducerSerialNumber field implementation, by Jason Gerecke
- a couple of new HID usages (added by HUT), by Olivier Gay

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
  HID: input: Fix TransducerSerialNumber implementation
  HID: add keyboard input assist hid usages
  HID: usbhid: enable always-poll quirk for Elan Touchscreen 016f
  HID: usbhid: enable always-poll quirk for Elan Touchscreen 009b

cxgb4vf: Replace repetitive pci device ID's with right ones

Replaced repetive Device ID's which got added in commit b961f9a48844ecf3
("cxgb4vf: Remove superfluous "idx" parameter of CH_DEVICE() macro")

Signed-off-by: Hariprasad Shenai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security

Pull Integrity subsystem fix from James Morris:
"These changes fix a bug in xattr handling, where the evm and ima
  inode_setxattr() functions do not check for empty xattrs being passed
  from userspace (leading to user-triggerable null pointer
  dereferences)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  evm: check xattr value length and type in evm_inode_setxattr()
  ima: check xattr value length and type in the ima_inode_setxattr()

ipv6: notify userspace when we added or changed an ipv6 token

NetworkManager might want to know that it changed when the router advertisement
arrives.

Signed-off-by: Lubomir Rintel <[email protected]>
Cc: Hannes Frederic Sowa <[email protected]>
Cc: Daniel Borkmann <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

sch_pie: schedule the timer after all init succeed

Cc: Vijay Subramanian <[email protected]>
Cc: David S. Miller <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Acked-by: Eric Dumazet <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux

Pull powerpc updates from Michael Ellerman:
"There's some bug fixes or cleanups to facilitate fixes, a MAINTAINERS
  update, and a new syscall (bpf)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/numa: ensure per-cpu NUMA mappings are correct on topology update
  powerpc/numa: use cached value of update->cpu in update_cpu_topology
  cxl: Fix PSL error due to duplicate segment table entries
  powerpc/mm: Use appropriate ESID mask in copro_calculate_slb()
  cxl: Refactor cxl_load_segment() and find_free_sste()
  cxl: Disable secondary hash in segment table
  Revert "powerpc/powernv: Fix endian bug in LPC bus debugfs accessors"
  powernv: Use _GLOBAL_TOC for opal wrappers
  powerpc: Wire up sys_bpf() syscall
  MAINTAINERS: nx-842 driver maintainer change
  powerpc/mm: Remove redundant #if case
  powerpc/mm: Fix build error with hugetlfs disabled

ARM: 8180/1: mm: implement no-highmem fast path in kmap_atomic_pfn()

Since CONFIG_HIGHMEM got enabled on ARMv5 Kirkwood, we have noticed a
very significant drop in networking performance. The test were
conducted on an OpenBlocks A7 board. Without this patch, the outgoing
performance measured with iperf are:

- highmem OFF, TSO OFF   544 Mbit/s
- highmem OFF, TSO ON   942 Mbit/s
- highmem ON,  TSO OFF   306 Mbit/s
- highmem ON,  TSO ON    246 Mbit/s

On this Kirkwood platform, the L2 cache is a Feroceon cache, and with
this cache, all the range operations have to be done on virtual
addresses and not physical addresses. Therefore, whenever
CONFIG_HIGHMEM is enabled, the cache maintenance operations call
kmap_atomic_pfn() and kunmap_atomic().

However, kmap_atomic_pfn() does not implement the same fast path for
non-highmem pages as the one implemented in kmap_atomic(), and this is
one of the reason for the performance drop. While this patch does not
fully restore the performances, it clearly improves them a lot:

                      without patch  with patch

- highmem ON, TSO OFF   306 Mbit/s     387 Mbit/s
- highmem ON, TSO ON    246 Mbit/s     434 Mbit/s

We're still far from the !CONFIG_HIGHMEM performances, but it does
improve a bit the situation.

Thanks a lot to Ezequiel Garcia and Gregory Clement for all the
testing work around this topic.

Signed-off-by: Thomas Petazzoni <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 8183/1: l2c: Improve l2c310_of_parse() error message

Russell King suggested [1]:

"I'd ask for one change.  Please make all these messages start with
"L2C-310 OF" not "PL310 OF:".  The device is described in ARM
documentation as a L2C-310 not PL310.  (Also note the : is dropped
too - most of the other messages don't have the : either.)

The:

"PL310 OF: cache setting yield illegal associativity
PL310 OF: -1073346556 calculated, only 8 and 16 legal"

message could also be changed to something like:

"L2C-310 OF cache associativity %d invalid, only 8 or 16 permittedn"

[1] http://www.spinics.net/lists/arm-kernel/msg372776.html

Signed-off-by: Fabio Estevam <[email protected]>
Signed-off-by: Russell King <[email protected]>

ARM: 8181/1: Drop extra return statement

Commit 513510ddba9650fc7da456eefeb0ead7632324f6
(common: dma-mapping: introduce common remapping functions)
managed to end up with an extra return statement from the
original patch. Drop it.

Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Russell King <[email protected]>

Merge tag 'usb-serial-3.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB-serial fixes for v3.18-rc3

These updates remove two allocations of unused buffers from kobil_sct
and add some new device ids.

Signed-off-by: Johan Hovold <[email protected]>