Chunyu Hu [Tue, 3 May 2016 11:34:34 +0000 (19:34 +0800)]
tracing: Don't display trigger file for events that can't be enabled
Currently register functions for events will be called
through the 'reg' field of event class directly without
any check when seting up triggers.
Triggers for events that don't support register through
debug fs (events under events/ftrace are for trace-cmd to
read event format, and most of them don't have a register
function except events/ftrace/functionx) can't be enabled
at all, and an oops will be hit when setting up trigger
for those events, so just not creating them is an easy way
to avoid the oops.
Ping Cheng [Tue, 3 May 2016 04:17:34 +0000 (21:17 -0700)]
HID: wacom: add missed stylus_in_proximity line back
Commit 7e12978 ("HID: wacom: break out wacom_intuos_get_tool_type") by accident
removed stylus_in_proximity flag for Intuos series while shuffling the code
around.
Fix that by reintroducing that flag setting in wacom_intuos_inout(), where
it originally was.
Neil Horman [Mon, 2 May 2016 16:20:15 +0000 (12:20 -0400)]
netem: Segment GSO packets on enqueue
This was recently reported to me, and reproduced on the latest net kernel,
when attempting to run netperf from a host that had a netem qdisc attached
to the egress interface:
The problem occurs because netem is not prepared to handle GSO packets (as it
uses skb_checksum_help in its enqueue path, which cannot manipulate these
frames).
The solution I think is to simply segment the skb in a simmilar fashion to the
way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes.
When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt
the first segment, and enqueue the remaining ones.
tested successfully by myself on the latest net kernel, to which this applies
David S. Miller [Tue, 3 May 2016 04:17:38 +0000 (00:17 -0400)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:
====================
In this small batch of patches you have:
- a fix for our Distributed ARP Table that makes sure that the input
provided to the hash function during a query is the same as the one
provided during an insert (so to prevent false negatives), by Antonio
Quartulli
- a fix for our new protocol implementation B.A.T.M.A.N. V that ensures
that a hard interface is properly re-activated when it is brought down
and then up again, by Antonio Quartulli
- two fixes respectively to the reference counting of the tt_local_entry
and neigh_node objects, by Sven Eckelmann. Such bug is rather severe
as it would prevent the netdev objects references by batman-adv from
being released after shutdown.
====================
Linus Torvalds [Mon, 2 May 2016 19:46:42 +0000 (12:46 -0700)]
Minimal fix-up of bad hashing behavior of hash_64()
This is a fairly minimal fixup to the horribly bad behavior of hash_64()
with certain input patterns.
In particular, because the multiplicative value used for the 64-bit hash
was intentionally bit-sparse (so that the multiply could be done with
shifts and adds on architectures without hardware multipliers), some
bits did not get spread out very much. In particular, certain fairly
common bit ranges in the input (roughly bits 12-20: commonly with the
most information in them when you hash things like byte offsets in files
or memory that have block factors that mean that the low bits are often
zero) would not necessarily show up much in the result.
There's a bigger patch-series brewing to fix up things more completely,
but this is the fairly minimal fix for the 64-bit hashing problem. It
simply picks a much better constant multiplier, spreading the bits out a
lot better.
NOTE! For 32-bit architectures, the bad old hash_64() remains the same
for now, since 64-bit multiplies are expensive. The bigger hashing
cleanup will replace the 32-bit case with something better.
The new constants were picked by George Spelvin who wrote that bigger
cleanup series. I just picked out the constants and part of the comment
from that series.
Linus Torvalds [Mon, 2 May 2016 19:22:51 +0000 (12:22 -0700)]
Merge tag 'md/4.6-rc6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
Pull MD fixes from Shaohua Li:
"This update includes several trival fixes. The only important one is
to fix MD bio merge, which has big performance impact"
* tag 'md/4.6-rc6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
raid5: delete unnecessary warnning
MD: make bio mergeable
md/raid0: remove empty line printk from dump_zones
md/raid0: fix uninitialized variable bug
Linus Torvalds [Mon, 2 May 2016 16:54:22 +0000 (09:54 -0700)]
Merge tag 'gpio-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"Here are some late but important fixes for the v4.6 kernel series.
ACPI and RCAR, so two driver fixes (PM related) and a self-evident
string lookup fix for ACPI GPIOs:
- A serious ACPI fix targeted for stable: lookup strings were being
free'd.
- Revert two patches from the RCAR driver"
* tag 'gpio-v4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
gpiolib-acpi: Duplicate con_id string when adding it to the crs lookup list
Revert "gpio: rcar: Fine-grained Runtime PM support"
Revert "gpio: rcar: Add Runtime PM handling for interrupts"
1) MODULE_FIRMWARE firmware string not correct for iwlwifi 8000 chips,
from Sara Sharon.
2) Fix SKB size checks in batman-adv stack on receive, from Sven
Eckelmann.
3) Leak fix on mac80211 interface add error paths, from Johannes Berg.
4) Cannot invoke napi_disable() with BH disabled in myri10ge driver,
fix from Stanislaw Gruszka.
5) Fix sign extension problem when computing feature masks in
net_gso_ok(), from Marcelo Ricardo Leitner.
6) lan78xx driver doesn't count packets and packet lengths in its
statistics properly, fix from Woojung Huh.
7) Fix the buffer allocation sizes in pegasus USB driver, from Petko
Manolov.
8) Fix refcount overflows in bpf, from Alexei Starovoitov.
9) Unified dst cache handling introduced a preempt warning in
ip_tunnel, fix by resetting rather then setting the cached route.
From Paolo Abeni.
10) Listener hash collision test fix in soreuseport, from Craig Gallak
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
gre: do not pull header in ICMP error processing
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
tipc: only process unicast on intended node
cxgb3: fix out of bounds read
net/smscx5xx: use the device tree for mac address
soreuseport: Fix TCP listener hash collision
net: l2tp: fix reversed udp6 checksum flags
ip_tunnel: fix preempt warning in ip tunnel creation/updating
samples/bpf: fix trace_output example
bpf: fix check_map_func_compatibility logic
bpf: fix refcnt overflow
drivers: net: cpsw: use of_phy_connect() in fixed-link case
dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive
drivers: net: cpsw: don't ignore phy-mode if phy-handle is used
drivers: net: cpsw: fix segfault in case of bad phy-handle
drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver
gre: reject GUE and FOU in collect metadata mode
pegasus: fixes reported packet length
pegasus: fixes URB buffer allocation size;
...
3) Allow proper auto-loading of VIO devices, from John Paul Adrian
Glaubitz.
4) Recognize Sonoma cpus, from Khalid Aziz.
5) Fix bootup regressions caused by syscall trace fixes made recently.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Fix bootup regressions on some Kconfig combinations.
sparc64: recognize and support Sonoma CPU type
sparc: Implement and wire up vio_hotplug for vio.
sparc: Implement and wire up modalias_show for vio.
sparc/pci: Refactor dev_archdata initialization into pci_init_dev_archdata
sparc/defconfigs: Remove CONFIG_IPV6_PRIVACY
sparc: Write up preadv2/pwritev2 syscalls.
sparc/PCI: Fix for panic while enabling SR-IOV
Dan Williams [Mon, 2 May 2016 16:11:53 +0000 (09:11 -0700)]
nfit: fix translation of command status results
When transportation of the command completes successfully, it indicates
that the 'status' result is valid. Fix the missed checking and
translation of the status field at the end of acpi_nfit_ctl().
Otherwise, we fail to handle reported errors and assume commands
complete successfully.
Devices that after having been reset during resume need to be rebound
due to a missing reset_resume callback, are now left in a suspended
state. This specifically broke resume of common USB-serial devices,
which are now unusable after system suspend (until disconnected and
reconnected) when USB persist is enabled.
During resume, usb_resume_interface will set the needs_binding flag for
such interfaces, but unlike system resume, run-time resume does not
honour it.
This patch fixes the issue where the mxs_ocotp_read is reading
the ocotp in reg_size steps but decrements the remaining size
by 1. The number of iterations is thus four times higher,
overwriting the area behind the output buffer.
Marek Szyprowski [Thu, 28 Apr 2016 10:25:04 +0000 (07:25 -0300)]
[media] media: s3c-camif: fix deadlock on driver probe()
Commit 0c426c472b5585ed6e59160359c979506d45ae49 ("[media] media: Always
keep a graph walk large enough around") changed
media_device_register_entity() function to take mdev->graph_mutex. This
causes deadlock in driver probe, which calls (indirectly) this function
with ->graph_mutex taken. This patch removes taking ->graph_mutex in
driver probe to avoid deadlock. Other drivers don't take ->graph_mutex
for entity registration, so this change should be safe.
Marek Szyprowski [Thu, 28 Apr 2016 10:25:03 +0000 (07:25 -0300)]
[media] media: exynos4-is: fix deadlock on driver probe
Commit 0c426c472b5585ed6e59160359c979506d45ae49 ("[media] media: Always
keep a graph walk large enough around") changed
media_device_register_entity() function to take mdev->graph_mutex. This
causes deadlock in driver probe, which calls (indirectly) this function
with ->graph_mutex taken. This patch removes taking ->graph_mutex in
driver probe to avoid deadlock. Other drivers don't take ->graph_mutex
for entity registration, so this change should be safe.
cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
Commit 41cfd64cf49fc "Update frequencies of policy->cpus only from
->set_policy()" changed the way the intel_pstate driver's ->set_policy
callback updates the HWP (hardware-managed P-states) settings.
A side effect of it is that if those settings are modified on the
boot CPU during system suspend and wakeup, they will never be
restored during subsequent system resume.
To address this problem, allow cpufreq drivers that don't provide
->target or ->target_index callbacks to use ->suspend and ->resume
callbacks and add a ->resume callback to intel_pstate to restore
the HWP settings on the CPUs that belong to the given policy.
Fixes: 41cfd64cf49fc "Update frequencies of policy->cpus only from ->set_policy()" Tested-by: Srinivas Pandruvada <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Acked-by: Viresh Kumar <[email protected]>
iptunnel_pull_header expects that IP header was already pulled; with this
expectation, it pulls the tunnel header. This is not true in gre_err.
Furthermore, ipv4_update_pmtu and ipv4_redirect expect that skb->data points
to the IP header.
We cannot pull the tunnel header in this path. It's just a matter of not
calling iptunnel_pull_header - we don't need any of its effects.
Fixes: bda7bb463436 ("gre: Allow multiple protocol listener for gre protocol.") Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Tim Bingham [Fri, 29 Apr 2016 17:30:23 +0000 (13:30 -0400)]
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
Prior to commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op
when !DEBUG") the implementation of net_dbg_ratelimited() was buggy
for both the DEBUG and CONFIG_DYNAMIC_DEBUG cases.
The bug was that net_ratelimit() was being called and, despite
returning true, nothing was being printed to the console. This
resulted in messages like the following -
"net_ratelimit: %d callbacks suppressed"
with no other output nearby.
After commit d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when
!DEBUG") the bug is fixed for the DEBUG case. However, there's no
output at all for CONFIG_DYNAMIC_DEBUG case.
This patch restores debug output (if enabled) for the
CONFIG_DYNAMIC_DEBUG case.
Add a definition of net_dbg_ratelimited() for the CONFIG_DYNAMIC_DEBUG
case. The implementation takes care to check that dynamic debugging is
enabled before calling net_ratelimit().
Fixes: d92cff89a0c8 ("net_dbg_ratelimited: turn into no-op when !DEBUG") Signed-off-by: Tim Bingham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Anton Blanchard [Fri, 29 Apr 2016 22:29:27 +0000 (08:29 +1000)]
powerpc: Fix bad inline asm constraint in create_zero_mask()
In create_zero_mask() we have:
addi %1,%2,-1
andc %1,%1,%2
popcntd %0,%1
using the "r" constraint for %2. r0 is a valid register in the "r" set,
but addi X,r0,X turns it into an li:
li r7,-1
andc r7,r7,r0
popcntd r4,r7
Fix this by using the "b" constraint, for which r0 is not a valid
register.
This was found with a kernel build using gcc trunk, narrowed down to
when -frename-registers was enabled at -O2. It is just luck however
that we aren't seeing this on older toolchains.
Thanks to Segher for working with me to find this issue.
Cc: [email protected] Fixes: d0cebfa650a0 ("powerpc: word-at-a-time optimization for 64-bit Little Endian") Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
Hamish Martin [Fri, 29 Apr 2016 14:40:24 +0000 (10:40 -0400)]
tipc: only process unicast on intended node
We have observed complete lock up of broadcast-link transmission due to
unacknowledged packets never being removed from the 'transmq' queue. This
is traced to nodes having their ack field set beyond the sequence number
of packets that have actually been transmitted to them.
Consider an example where node 1 has sent 10 packets to node 2 on a
link and node 3 has sent 20 packets to node 2 on another link. We
see examples of an ack from node 2 destined for node 3 being treated as
an ack from node 2 at node 1. This leads to the ack on the node 1 to node
2 link being increased to 20 even though we have only sent 10 packets.
When node 1 does get around to sending further packets, none of the
packets with sequence numbers less than 21 are actually removed from the
transmq.
To resolve this we reinstate some code lost in commit d999297c3dbb ("tipc:
reduce locking scope during packet reception") which ensures that only
messages destined for the receiving node are processed by that node. This
prevents the sequence numbers from getting out of sync and resolves the
packet leakage, thereby resolving the broadcast-link transmission
lock-ups we observed.
While we are aware that this change only patches over a root problem that
we still haven't identified, this is a sanity test that it is always
legitimate to do. It will remain in the code even after we identify and
fix the real problem.
Michal Schmidt [Fri, 29 Apr 2016 09:06:50 +0000 (11:06 +0200)]
cxgb3: fix out of bounds read
An out of bounds read of 2 bytes was discovered in cxgb3 with KASAN.
t3_config_rss() expects both arrays it gets as parameters to have
terminators. setup_rss(), the caller, forgets to add a terminator to
one of the arrays. Thankfully the iteration in t3_config_rss() stops
anyway, but in the last iteration the check for the terminator
is an out of bounds read.
This takes the MAC address for smsc75xx/smsc95xx USB network devices
from a the device tree. This is required to get a usable persistent
address on the popular beagleboard, whose hardware designers
accidentally forgot that an ethernet device really requires an a
MAC address to be functional.
The Raspberry Pi also ships smsc9514 without a serial EEPROM, stores
the MAC address in ROM accessible via VC4 firmware.
The smsc75xx and smsc95xx drivers are just two copies of the
same code, so better fix both.
[[email protected]: updated to use of_get_property() as per suggestion from
Arnd, reworded the message and comments a bit]
I forgot to include a check for listener port equality when deciding
if two sockets should belong to the same reuseport group. This was
not caught previously because it's only necessary when two listening
sockets for the same user happen to hash to the same listener bucket.
The same error does not exist in the UDP path.
Fixes: c125e80b8868("soreuseport: fast reuseport TCP socket selection") Signed-off-by: Craig Gallek <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Wang Shanker [Thu, 28 Apr 2016 17:29:43 +0000 (01:29 +0800)]
net: l2tp: fix reversed udp6 checksum flags
This patch fixes a bug which causes the behavior of whether to ignore
udp6 checksum of udp6 encapsulated l2tp tunnel contrary to what
userspace program requests.
When the flag `L2TP_ATTR_UDP_ZERO_CSUM6_RX` is set by userspace, it is
expected that udp6 checksums of received packets of the l2tp tunnel
to create should be ignored. In `l2tp_netlink.c`:
`l2tp_nl_cmd_tunnel_create()`, `cfg.udp6_zero_rx_checksums` is set
according to the flag, and then passed to `l2tp_core.c`:
`l2tp_tunnel_create()` and then `l2tp_tunnel_sock_create()`. In
`l2tp_tunnel_sock_create()`, `udp_conf.use_udp6_rx_checksums` is set
the same to `cfg.udp6_zero_rx_checksums`. However, if we want the
checksum to be ignored, `udp_conf.use_udp6_rx_checksums` should be set
to `false`, i.e. be set to the contrary. Similarly, the same should be
done to `udp_conf.use_udp6_tx_checksums`.
Dan Carpenter [Fri, 15 Apr 2016 14:45:10 +0000 (17:45 +0300)]
virtio: Silence uninitialized variable warning
Smatch complains that we might not initialize "queue". The issue is
callers like setup_vq() from virtio_pci_modern.c where "num" could be
something like 2 and "vring_align" is 64. In that case, vring_size() is
less than PAGE_SIZE. It won't happen in real life, but we're getting
the value of "num" from a register so it's not really possible to tell
what value it holds with static analysis.
Linus Torvalds [Sun, 1 May 2016 01:57:42 +0000 (18:57 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal fixes from Eduardo Valentin:
"A couple of minor fixes for the thermal subsystem.
Specifics in this pull request:
- Fixes in hisilicon thermal driver
- More fixes of unsigned to int type change in thermal_core.c"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: use %d to print S32 parameters
thermal: hisilicon: increase temperature resolution
Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read()
On the consumer side, we have interrupt driven flow management of the
producer. It is sufficient to base the signaling decision on the
amount of space that is available to write after the read is complete.
The current code samples the previous available space and uses this
in making the signaling decision. This state can be stale and is
unnecessary. Since the state can be stale, we end up not signaling
the host (when we should) and this can result in a hang. Fix this
problem by removing the unnecessary check. I would like to thank
Arseney Romanenko <[email protected]> for pointing out this issue.
Also, issue a full memory barrier before making the signaling descision
to correctly deal with potential reordering of the write (read index)
followed by the read of pending_sz.
Dan Williams [Sat, 30 Apr 2016 20:07:06 +0000 (13:07 -0700)]
libnvdimm, pfn: fix memmap reservation sizing
When configuring a pfn-device instance to allocate the memmap array it
needs to account for the fact that vmemmap_populate_hugepages()
allocates struct page blocks in HPAGE_SIZE chunks. We need to align the
reserved area size to 2MB otherwise arch_add_memory() runs out of memory
while establishing the memmap:
Ville Syrjälä [Mon, 25 Apr 2016 13:01:19 +0000 (16:01 +0300)]
gpiolib-acpi: Duplicate con_id string when adding it to the crs lookup list
Calling gpiod_get() from a module and then unloading the module leads to an
oops due to acpi_can_fallback_to_crs() storing the pointer to the passed
'con_id' string onto acpi_crs_lookup_list. The next guy to come along will then
try to access the string but the memory may now be gone with the module.
Make a copy of the passed string instead, and store the copy on the list.
Merge tag 'powerpc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A few more powerpc fixes for 4.6:
- cxl: Keep IRQ mappings on context teardown from Michael Neuling
- cxl: Poll for outstanding IRQs when detaching a context from
Michael Neuling
- Wire up preadv2 and pwritev2 syscalls from Rui Salvaterra"
* tag 'powerpc-4.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: wire up preadv2 and pwritev2 syscalls
cxl: Poll for outstanding IRQs when detaching a context
cxl: Keep IRQ mappings on context teardown
Merge tag 'edac_fix_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
Pull EDAC fix from Borislav Petkov:
"Make sure sb_edac and i7core_edac do not terminate MCE processing on
the decoding callchain prematurely"
* tag 'edac_fix_for_4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
Merge tag 'pm+acpi-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"One revert of a recent cpufreq commit that introduced a regression and
a fix for intel_pstate's Turbo Activation Ratio handling code.
Specifics:
- Revert cpufreq commit that attempted to fix a problem in the
ondemand/conservative governor code, but did that incorrectly and
introduced another problem instead (Rafael Wysocki).
- Fix incorrect decoding of MSR contents related to the Turbo
Activation Ratio (TAR) handling in the intel_pstate driver
(Srinivas Pandruvada)"
* tag 'pm+acpi-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"A few fixes all over the place:
radeon is probably the biggest standout, it's a fix for screen
corruption or hung black outputs so I thought it was worth pulling in.
Otherwise some amdgpu power control fixes, some misc vmwgfx fixes, one
etnaviv fix, one virtio-gpu fix, two DP MST fixes, and a single TTM
fix"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/vmwgfx: Fix order of operation
drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
drm/amdgpu: disable vm interrupts with vm_fault_stop=2
drm/amdgpu: print a message if ATPX dGPU power control is missing
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
drm/radeon: fix vertical bars appear on monitor (v2)
drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tail
drm/virtio: send vblank event after crtc updates
drm/dp/mst: Restore primary hub guid on resume
drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
drm/etnaviv: don't move linear memory window on 3D cores without MC2.0
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Final set of -rc fixes for 4.6.
I've collected up a number of patches that are all pretty small with
the exception of only a couple. The hfi1 driver has a number of
important patches, and it is what really drives the line count of this
pull request up. These are all small and I've got this kernel built
and running in the test lab (I have most of the hardware, I think nes
is the only thing in this patch set that I can't say I've personally
tested and have up and running).
Summary:
- A number of collected fixes for oopses, memory corruptions,
deadlocks, etc. All of these fixes are small (many only 5-10
lines), obvious, and tested.
- Fix for the security issue related to the use of write for
bi-directional communications"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
RDMA/nes: don't leak skb if carrier down
IB/security: Restrict use of the write() interface
IB/hfi1: Use kernel default llseek for ui device
IB/hfi1: Don't attempt to free resources if initialization failed
IB/hfi1: Fix missing lock/unlock in verbs drain callback
IB/rdmavt: Fix send scheduling
IB/hfi1: Prevent unpinning of wrong pages
IB/hfi1: Fix deadlock caused by locking with wrong scope
IB/hfi1: Prevent NULL pointer deferences in caching code
MAINTAINERS: Update iser/isert maintainer contact info
IB/mlx5: Expose correct max_sge_rd limit
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
iw_cxgb4: handle draining an idle qp
iw_cxgb3: initialize ibdev.iwcm->ifname for port mapping
iw_cxgb4: initialize ibdev.iwcm->ifname for port mapping
IB/core: Don't drain non-existent rq queue-pair
IB/core: Fix oops in ib_cache_gid_set_default_gid
Shaohua Li [Fri, 29 Apr 2016 21:18:03 +0000 (14:18 -0700)]
raid5: delete unnecessary warnning
If device has R5_LOCKED set, it's legit device has R5_SkipCopy set and page !=
orig_page. After R5_LOCKED is clear, handle_stripe_clean_event will clear the
SkipCopy flag and set page to orig_page. So the warning is unnecessary.
Kevin Hilman [Fri, 29 Apr 2016 19:04:02 +0000 (12:04 -0700)]
Merge tag 'sunxi-fixes-for-4.6' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes
Allwinner fixes for 4.6
A single regulator fix
* tag 'sunxi-fixes-for-4.6' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
ARM: dts: sun8i-q8-common: Do not set constraints on dc1sw regulator
Arnd Bergmann [Tue, 15 Mar 2016 21:34:45 +0000 (22:34 +0100)]
ARM: davinci: only use NVMEM when available
The davinci platform contains code that calls into the nvmem
subsystem, but that might be a loadable module, causing a
link error:
arch/arm/mach-davinci/built-in.o: In function `davinci_get_mac_addr':
:(.text+0x1088): undefined reference to `nvmem_device_read'
arch/arm/mach-davinci/built-in.o: In function `read_factory_config':
:(.text+0x214c): undefined reference to `nvmem_device_read'
Also, when NVMEM is completely disabled, the functions fail with
nonobvious error messages.
This ensures we only call the API functions when the code is actually
reachable from the board file, and otherwise prints a unique log
message.
Signed-off-by: Arnd Bergmann <[email protected]> Fixes: bec3c11bad0e ("misc: at24: replace memory_accessor with nvmem_device_read") Signed-off-by: Sekhar Nori <[email protected]> Signed-off-by: Kevin Hilman <[email protected]>
* emailed patches from Andrew Morton <[email protected]>:
Documentation/sysctl/vm.txt: update numa_zonelist_order description
lib/stackdepot.c: allow the stack trace hash to be zero
rapidio: fix potential NULL pointer dereference
mm/memory-failure: fix race with compound page split/merge
ocfs2/dlm: return zero if deref_done message is successfully handled
Ananth has moved
kcov: don't profile branches in kcov
kcov: don't trace the code coverage code
mm: wake kcompactd before kswapd's short sleep
.mailmap: add Frank Rowand
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: call swap_slot_free_notify() with page lock held
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
mailmap: fix Krzysztof Kozlowski's misspelled name
thp: keep huge zero page pinned until tlb flush
mm: exclude HugeTLB pages from THP page_mapped() logic
kexec: export OFFSET(page.compound_head) to find out compound tail page
kexec: update VMCOREINFO for compound_order/dtor
Paolo Abeni [Thu, 28 Apr 2016 09:04:51 +0000 (11:04 +0200)]
ip_tunnel: fix preempt warning in ip tunnel creation/updating
After the commit e09acddf873b ("ip_tunnel: replace dst_cache with generic
implementation"), a preemption debug warning is triggered on ip4
tunnels updating; the dst cache helper needs to be invoked in unpreemptible
context.
We don't need to load the cache on tunnel update, so this commit fixes
the warning replacing the load with a dst cache reset, which is
preempt safe.
Tony Luck [Fri, 29 Apr 2016 13:42:25 +0000 (15:42 +0200)]
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
Both of these drivers can return NOTIFY_BAD, but this terminates
processing other callbacks that were registered later on the chain.
Since the driver did nothing to log the error it seems wrong to prevent
other interested parties from seeing it. E.g. neither of them had even
bothered to check the type of the error to see if it was a memory error
before the return NOTIFY_BAD.
* pm-cpufreq-fixes:
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "cpufreq: governor: Fix negative idle_time when configured with CONFIG_HZ_PERIODIC"
Sven Eckelmann [Fri, 11 Mar 2016 15:44:06 +0000 (16:44 +0100)]
batman-adv: Fix reference counting of hardif_neigh_node object for neigh_node
The batadv_neigh_node was specific to a batadv_hardif_neigh_node and held
an implicit reference to it. But this reference was never stored in form of
a pointer in the batadv_neigh_node itself. Instead
batadv_neigh_node_release depends on a consistent state of
hard_iface->neigh_list and that batadv_hardif_neigh_get always returns the
batadv_hardif_neigh_node object which it has a reference for. But
batadv_hardif_neigh_get cannot guarantee that because it is working only
with rcu_read_lock on this list. It can therefore happen that a neigh_addr
is in this list twice or that batadv_hardif_neigh_get cannot find the
batadv_hardif_neigh_node for an neigh_addr due to some other list
operations taking place at the same time.
Instead add a batadv_hardif_neigh_node pointer directly in
batadv_neigh_node which will be used for the reference counter decremented
on release of batadv_neigh_node.
Fixes: cef63419f7db ("batman-adv: add list of unique single hop neighbors per hard-interface") Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
Sven Eckelmann [Fri, 11 Mar 2016 15:44:05 +0000 (16:44 +0100)]
batman-adv: Fix reference counting of vlan object for tt_local_entry
The batadv_tt_local_entry was specific to a batadv_softif_vlan and held an
implicit reference to it. But this reference was never stored in form of a
pointer in the tt_local_entry itself. Instead batadv_tt_local_remove,
batadv_tt_local_table_free and batadv_tt_local_purge_pending_clients depend
on a consistent state of bat_priv->softif_vlan_list and that
batadv_softif_vlan_get always returns the batadv_softif_vlan object which
it has a reference for. But batadv_softif_vlan_get cannot guarantee that
because it is working only with rcu_read_lock on this list. It can
therefore happen that an vid is in this list twice or that
batadv_softif_vlan_get cannot find the batadv_softif_vlan for an vid due to
some other list operations taking place at the same time.
Instead add a batadv_softif_vlan pointer directly in batadv_tt_local_entry
which will be used for the reference counter decremented on release of
batadv_tt_local_entry.
Fixes: 35df3b298fc8 ("batman-adv: fix TT VLAN inconsistency on VLAN re-add") Signed-off-by: Sven Eckelmann <[email protected]> Acked-by: Antonio Quartulli <[email protected]> Signed-off-by: Marek Lindner <[email protected]> Signed-off-by: Antonio Quartulli <[email protected]>
batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP event
At the moment there is no explicit reactivation of an hard-interface
upon NETDEV_UP event. In case of B.A.T.M.A.N. IV the interface is
reactivated as soon as the next OGM is scheduled for sending, but this
mechanism does not work with B.A.T.M.A.N. V. The latter does not rely
on the same scheduling mechanism as its predecessor and for this reason
the hard-interface remains deactivated forever after being brought down
once.
This patch fixes the reactivation mechanism by adding a new routing API
which explicitly allows each algorithm to perform any needed operation
upon interface re-activation.
Such API is optional and is implemented by B.A.T.M.A.N. V only and it
just takes care of setting the iface status to ACTIVE
batman-adv: fix DAT candidate selection (must use vid)
Now that DAT is VLAN aware, it must use the VID when
computing the DHT address of the candidate nodes where
an entry is going to be stored/retrieved.
Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware") Signed-off-by: Antonio Quartulli <[email protected]>
[[email protected]: fix conflicts with current version] Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Marek Lindner <[email protected]>
Dave Airlie [Fri, 29 Apr 2016 04:31:44 +0000 (14:31 +1000)]
Merge branch 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A few fixes for 4.6.
- revert amdgpu PX commit that was previously reverted on the radeon side
- cleaned up version of the NI+ MC update display fix for radeon
- TTM kref fix
* 'drm-fixes-4.6' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: disable vm interrupts with vm_fault_stop=2
drm/amdgpu: print a message if ATPX dGPU power control is missing
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
drm/radeon: fix vertical bars appear on monitor (v2)
drm/ttm: fix kref count mess in ttm_bo_move_to_lru_tail
Dave Airlie [Fri, 29 Apr 2016 04:27:50 +0000 (14:27 +1000)]
Merge branch 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux into drm-fixes
three misc vmwgfx fixes
* 'drm-vmwgfx-fixes' of git://people.freedesktop.org/~syeh/repos_linux:
drm/vmwgfx: Fix order of operation
drm/vmwgfx: use vmw_cmd_dx_cid_check for query commands.
drm/vmwgfx: Enable SVGA_3D_CMD_DX_SET_PREDICATION
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two boot crash fixes and an IRQ handling crash fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Handle zero vector gracefully in clear_vector_irq()
Revert "x86/mm/32: Set NX in __supported_pte_mask before enabling paging"
xen/qspinlock: Don't kick CPU if IRQ is not initialized
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"x86 PMU driver fixes plus a core code race fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix incorrect lbr_sel_mask value
perf/x86/intel/pt: Don't die on VMXON
perf/core: Fix perf_event_open() vs. execve() race
perf/x86/amd: Set the size of event map array to PERF_COUNT_HW_MAX
perf/core: Make sysctl_perf_cpu_time_max_percent conform to documentation
perf/x86/intel/rapl: Add missing Haswell model
perf/x86/intel: Add model number for Skylake Server to perf
Merge tag 'sound-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Usually we get a big collection of fixes for ASoC once during rc. And
this is it.
At this time, most of fixes are about Intel Skylake ASoC driver, which
is a new and still on-going development. Along with it, a slight
large LOC is seen in legacy HD-audio driver, but it's merely a code
move to the upper layer.
Other than that, the rest are small or trivial fixes to various
drivers, in addition to an ASoC dapm debugfs code fix"
* tag 'sound-4.6-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
ALSA: hda - Update BCLK also at hotplug for i915 HSW/BDW
ALSA: hda - Add dock support for ThinkPad X260
ASoC: wm5102: Free compressed IRQ in CODEC remove
ASoC: arizona: Free speaker thermal IRQs in CODEC remove
ASoC: Intel: Skylake: Fix ibs/obs calc for non-integral sampling rates
ASoC: Intel: sst: fix a loop timeout in sst_hsw_stream_reset()
ASoC: Intel: Skylake: Fix to turn OFF codec power when entering S3
ASoC: hdac_hdmi: Fix codec power state in S3 during playback
ASoC: hdac_hdmi: Fix to use dev_pm ops instead soc pm
ASoC: wm8962: Correct typo when setting DSPCLK rate
ASoC: nau8825: Fix jack detection across suspend
ASoC: Intel: Skylake: Fix DSP resource de-allocation
ASoC: Intel: Skylake: Fix for unloading module only when it is loaded
ASoC: Intel: Skylake: Fix kbuild dependency
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: rt5640: Correct the digital interface data select
ASoC: Intel: Skylake: remove call to pci_dev_put
ASoC: Intel: Skylake: Call i915 exit last
ASoC: Intel: Skylake: Unmap the address last
ASoC: Intel: Skylake: Freeup properly on skl_dsp_free
...
Commit 3193913ce62c ("mm: page_alloc: default node-ordering on 64-bit
NUMA, zone-ordering on 32-bit") changes the default value of
numa_zonelist_order. Update the document.
lib/stackdepot.c: allow the stack trace hash to be zero
Do not bail out from depot_save_stack() if the stack trace has zero hash.
Initially depot_save_stack() silently dropped stack traces with zero
hashes, however there's actually no point in reserving this zero value.
The change fixes improper check for a returned error value by
class_create() function, which on error returns ERR_PTR() value, thus the
original check always results in a dead code on error path.
James Morse [Thu, 28 Apr 2016 23:18:52 +0000 (16:18 -0700)]
kcov: don't trace the code coverage code
Kcov causes the compiler to add a call to __sanitizer_cov_trace_pc() in
every basic block. Ftrace patches in a call to _mcount() to each
function it has annotated.
Letting these mechanisms annotate each other is a bad thing. Break the
loop by adding 'notrace' to __sanitizer_cov_trace_pc() so that ftrace
won't try to patch this code.
This patch lets arm64 with KCOV and STACK_TRACER boot.
When kswapd goes to sleep it checks if the node is balanced and at first
it sleeps only for HZ/10 time, then rechecks if the node is still
balanced and nobody has woken it during the initial sleep. Only then it
goes fully sleep until an allocation slowpath wakes it up again.
For higher-order allocations, waking up kcompactd is done only before
the full sleep. This turns out to be an issue in case another
high-order allocation fails during the initial sleep. It will wake
kswapd up, however kswapd considers the zone balanced from the order-0
perspective, and will just quickly try to sleep again. So if there's a
longer stream of high-order allocations hitting the slowpath and waking
up kswapd, it might never actually wake up kcompactd, which may be
considered a regression from kswapd-based compaction. In the worst
case, it might be that a single allocation that cannot direct
reclaim/compact itself is waking kswapd in the retry loop and preventing
kcompactd from being woken up and unblocking it.
This patch makes sure kcompactd is woken up in such situations by simply
moving the wakeup before the short initial sleep. More efficient
solution would be to wake kcompactd immediately instead of kswapd if the
node is already order-0 balanced, but in that case we should also move
reset_isolation_suitable() call to kcompactd so it's not adding to the
allocator's latency. Since it's late in the 4.6 cycle, let's go with
the simpler change for now.
Currently, migration code increses num_poisoned_pages on *failed*
migration page as well as successfully migrated one at the trial of
memory-failure. It will make the stat wrong. As well, it marks the
page as PG_HWPoison even if the migration trial failed. It would mean
we cannot recover the corrupted page using memory-failure facility.
Minchan Kim [Thu, 28 Apr 2016 23:18:41 +0000 (16:18 -0700)]
mm: call swap_slot_free_notify() with page lock held
Kyeongdon reported below error which is BUG_ON(!PageSwapCache(page)) in
page_swap_info. The reason is that page_endio in rw_page unlocks the
page if read I/O is completed so we need to hold a PG_lock again to
check PageSwapCache. Otherwise, the page can be removed from swapcache.
Kernel BUG at c00f9040 [verbose debug info unavailable]
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
Modules linked in:
CPU: 4 PID: 13446 Comm: RenderThread Tainted: G W 3.10.84-g9f14aec-dirty #73
task: c3b73200 ti: dd192000 task.ti: dd192000
PC is at page_swap_info+0x10/0x2c
LR is at swap_slot_free_notify+0x18/0x6c
pc : [<c00f9040>] lr : [<c00f5560>] psr: 400f0113
sp : dd193d78 ip : c2deb1e4 fp : da015180
r10: 00000000 r9 : 000200da r8 : c120fe08
r7 : 00000000 r6 : 00000000 r5 : c249a6c0 r4 : = c249a6c0
r3 : 00000000 r2 : 40080009 r1 : 200f0113 r0 : = c249a6c0
..<snip> ..
Call Trace:
page_swap_info+0x10/0x2c
swap_slot_free_notify+0x18/0x6c
swap_readpage+0x90/0x11c
read_swap_cache_async+0x134/0x1ac
swapin_readahead+0x70/0xb0
handle_pte_fault+0x320/0x6fc
handle_mm_fault+0xc0/0xf0
do_page_fault+0x11c/0x36c
do_DataAbort+0x34/0x118
Minchan Kim [Thu, 28 Apr 2016 23:18:38 +0000 (16:18 -0700)]
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
We have been reclaimed highmem zone if buffer_heads is over limit but
commit 6b4f7799c6a5 ("mm: vmscan: invoke slab shrinkers from
shrink_zone()") changed the behavior so it doesn't reclaim highmem zone
although buffer_heads is over the limit. This patch restores the logic.
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture. On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.
On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available. In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped. On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.
This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
it cannot distinguish private /dev/zero mappings from other special
mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.
This fixes false-positive VM_BUG_ON and prevents installing THP where
they are not expected.
mailmap: fix Krzysztof Kozlowski's misspelled name
Patchwork introduced a garbled Polish character in commit 1e3012d0fdc5
("crypto: s5p-sss - Use memcpy_toio for iomem annotated memory") so fix
the mail mapping. Additionally prefer to use kernel.org account for
personal work, instead of my gmail address.
Andrea has found[1] a race condition on MMU-gather based TLB flush vs
split_huge_page() or shrinker which frees huge zero under us (patch 1/2
and 2/2 respectively).
With new THP refcounting, we don't need patch 1/2: mmu_gather keeps the
page pinned until flush is complete and the pin prevents the page from
being split under us.
We still need patch 2/2. This is simplified version of Andrea's patch.
We don't need fancy encoding.
Steve Capper [Thu, 28 Apr 2016 23:18:24 +0000 (16:18 -0700)]
mm: exclude HugeTLB pages from THP page_mapped() logic
HugeTLB pages cannot be split, so we use the compound_mapcount to track
rmaps.
Currently page_mapped() will check the compound_mapcount, but will also
go through the constituent pages of a THP compound page and query the
individual _mapcount's too.
Unfortunately, page_mapped() does not distinguish between HugeTLB and
THP compound pages and assumes that a compound page always needs to have
HPAGE_PMD_NR pages querying.
For most cases when dealing with HugeTLB this is just inefficient, but
for scenarios where the HugeTLB page size is less than the pmd block
size (e.g. when using contiguous bit on ARM) this can lead to crashes.
This patch adjusts the page_mapped function such that we skip the
unnecessary THP reference checks for HugeTLB pages.
kexec: export OFFSET(page.compound_head) to find out compound tail page
PageAnon() always look at head page to check PAGE_MAPPING_ANON and tail
page's page->mapping has just a poisoned data since commit 1c290f642101
("mm: sanitize page->mapping for tail pages").
If makedumpfile checks page->mapping of a compound tail page to
distinguish anonymous page as usual, it must fail in newer kernel. So
it's necessary to export OFFSET(page.compound_head) to avoid checking
compound tail pages.
The problem is that unnecessary hugepages won't be removed from a dump
file in kernels 4.5.x and later. This means that extra disk space would
be consumed. It's a problem, but not critical.
makedumpfile refers page.lru.next to get the order of compound pages for
page filtering.
However, now the order is stored in page.compound_order, hence
VMCOREINFO should be updated to export the offset of
page.compound_order.
The fact is, page.compound_order was introduced already in kernel 4.0,
but the offset of it was the same as page.lru.next until kernel 4.3, so
this was not actual problem.
The above can be said also for page.lru.prev and page.compound_dtor,
it's necessary to detect hugetlbfs pages. Further, the content was
changed from direct address to the ID which means dtor.
The problem is that unnecessary hugepages won't be removed from a dump
file in kernels 4.4.x and later. This means that extra disk space would
be consumed. It's a problem, but not critical.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
"There is a lifecycle fix in the auth code, a fix for a narrow race
condition on map, and a helpful message in the log when there is a
feature mismatch (which happens frequently now that the default
server-side options have changed)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: report unsupported features to syslog
rbd: fix rbd map vs notify races
libceph: make authorizer destruction independent of ceph_auth_client
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Three more bug fixes for 4.6
- Due to a race in the dynamic page table code a multi-threaded
program can cause a translation specification exception. With
panic_on_oops a user space program can crash the system.
- An information leak with the /dev/sclp device.
- A use after free in the s390 PCI code"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/sclp_ctl: fix potential information leak with /dev/sclp
s390/mm: fix asce_bits handling with dynamic pagetable levels
s390/pci: fix use after free in dma_init
llvm cannot always recognize memset as builtin function and optimize
it away, so just delete it. It was a leftover from testing
of bpf_perf_event_output() with large data structures.
Fixes: 39111695b1b8 ("samples: bpf: add bpf_perf_event_output example") Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
The commit 35578d798400 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter")
introduced clever way to check bpf_helper<->map_type compatibility.
Later on commit a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") adjusted
the logic and inadvertently broke it.
Get rid of the clever bool compare and go back to two-way check
from map and from helper perspective.
On a system with >32Gbyte of phyiscal memory and infinite RLIMIT_MEMLOCK,
the malicious application may overflow 32-bit bpf program refcnt.
It's also possible to overflow map refcnt on 1Tb system.
Impose 32k hard limit which means that the same bpf program or
map cannot be shared by more than 32k processes.
This series fixes a number of related issues around using phy-handle
properties in cpsw emac nodes.
Patch 1 fixes a bug if more than one slave is used, and either
slave uses the phy-handle property in the devicetree.
Patch 2 fixes a NULL pointer dereference which can occur if a
phy-handle property is used and of_phy_connect() return NULL,
such as with a bad devicetree.
Patch 3 fixes an issue where the phy-mode property would be ignored
if a phy-handle property was used. This also fixes a bogus error
message that would be emitted.
Patch 4 fixes makes the binding documentation more explicit that
exactly one PHY property should be used, and also marks phy_id as
deprecated.
Patch 5 cleans up the fixed-link case to work like the now-fixed
phy-handle case.
I have tested on the following hardware configurations:
- (EVMSK) dual emac, phy_id property in both slaves
- (EVMSK) dual emac, phy-handle property in both slaves
- (EVMSK) a bad phy-handle property pointing to &mmc1
- (EVMSK) phy_id property with incorrect PHY address
- (BeagleBoneBlack) single emac, phy_id property
- (custom) single emac, fixed-link subnode
Andrew Goodbody reported testing v2 on a board that doesn't use
dual_emac mode, but with 2 PHYs using phy-handle properties [1].
Nicolas Chauvet reported testing v2 on an HP t410 (dm8148).
Markus Brunner reported testing v1 on the following [2]:
- emac0 with phy_id and emac1 with fixed phy
- emac0 with phy-handle and emac1 with fixed phy
- emac0 with fixed phy and emac1 with fixed phy
David Rivshin [Thu, 28 Apr 2016 01:45:45 +0000 (21:45 -0400)]
drivers: net: cpsw: use of_phy_connect() in fixed-link case
If a fixed-link DT subnode is used, the phy_device was looked up so
that a PHY ID string could be constructed and passed to phy_connect().
This is not necessary, as the device_node can be passed directly to
of_phy_connect() instead. This reuses the same codepath as if the
phy-handle DT property was used.
David Rivshin [Thu, 28 Apr 2016 01:38:26 +0000 (21:38 -0400)]
drivers: net: cpsw: don't ignore phy-mode if phy-handle is used
The phy-mode emac property was only being processed in the phy_id
or fixed-link cases. However if phy-handle was specified instead,
an error message would complain about the lack of phy_id or
fixed-link, and then jump past the of_get_phy_mode(). This would
result in the PHY mode defaulting to MII, regardless of what the
devicetree specified.
David Rivshin [Thu, 28 Apr 2016 01:32:31 +0000 (21:32 -0400)]
drivers: net: cpsw: fix segfault in case of bad phy-handle
If an emac node has a phy-handle property that points to something
which is not a phy, then a segmentation fault will occur when the
interface is brought up. This is because while phy_connect() will
return ERR_PTR() on failure, of_phy_connect() will return NULL.
The common error check uses IS_ERR(), and so missed when
of_phy_connect() fails. The NULL pointer is then dereferenced.
Also, the common error message referenced slave->data->phy_id,
which would be empty in the case of phy-handle. Instead, use the
name of the device_node as a useful identifier. And in the phy_id
case add the error code for completeness.
Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing") Signed-off-by: David Rivshin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David Rivshin [Thu, 28 Apr 2016 01:25:25 +0000 (21:25 -0400)]
drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
Commit 9e42f715264ff158478fa30eaed847f6e131366b ("drivers: net: cpsw: add
phy-handle parsing") saved the "phy-handle" phandle into a new cpsw_priv
field. However, phy connections are per-slave, so the phy_node field should
be in cpsw_slave_data rather than cpsw_priv.
This would go unnoticed in a single emac configuration. But in dual_emac
mode, the last "phy-handle" property parsed for either slave would be used
by both of them, causing them both to refer to the same phy_device.
The collect metadata mode does not support GUE nor FOU. This might be
implemented later; until then, we should reject such config.
I think this is okay to be changed. It's unlikely anyone has such
configuration (as it doesn't work anyway) and we may need a way to
distinguish whether it's supported or not by the kernel later.
For backwards compatibility with iproute2, it's not possible to just check
the attribute presence (iproute2 always includes the attribute), the actual
value has to be checked, too.
Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>
As noticed by Lincoln Ramsay <a1291762@gmail.com> some old (usb 1.1) Pegasus
based devices may actually return more bytes than the specified in the datasheet
amount. That would not be a problem if the allocated space for the SKB was
equal to the parameter passed to usb_fill_bulk_urb(). Some poor bugger (i
really hope it was not me, but 'git blame' is useless in this case, so anyway)
decided to add '+ 8' to the buffer length parameter. Sometimes the usb transfer
overflows and corrupts the socket structure, leading to kernel panic.
The above doesn't seem to happen for newer (Pegasus2 based) devices which did
help this bug to hide for so long.
The new default is to not include the CRC at the end of each received package.
So far CRC has been ignored which makes no sense to do it in a first place.
The patch is against v4.6-rc5 and was tested on ADM8515 device by transferring
multiple gigabytes of data over a couple of days without any complaints from the
kernel. Please apply it to whatever net tree you deem fit.
Changes since v1:
- split the patch in two parts;
- corrected the subject lines;
Changes since v2:
- do not append CRC by default (based on a discussion with Johannes Berg);
====================
The default Pegasus setup was to append the status and CRC at the end of each
received packet. The status bits are used to update various stats, but CRC has
been ignored. The new default is to not append CRC at the end of RX packets.
usb_fill_bulk_urb() receives buffer length parameter 8 bytes larger
than what's allocated by alloc_skb(); This seems to be a problem with
older (pegasus usb-1.1) devices, which may silently return more data
than the maximal packet length.
David S. Miller [Thu, 28 Apr 2016 21:02:45 +0000 (17:02 -0400)]
Merge branch 'gre-lwt-fixes'
Jiri Benc says:
====================
gre: fix lwtunnel support
This patchset fixes a few bugs in ipgre metadata mode implementation.
As an example, in this setup:
ip a a 192.168.1.1/24 dev eth0
ip l a gre1 type gre external
ip l s gre1 up
ip a a 192.168.99.1/24 dev gre1
ip r a 192.168.99.2/32 encap ip dst 192.168.1.2 ttl 10 dev gre1
ping 192.168.99.2
the traffic does not go through before this patchset and does as expected
with it applied.
v3: Back to v1 in order not to break existing users. Dropped patch 3, will
be fixed in iproute2 instead.
v2: Rejecting invalid configuration, added patch 3, dropped patch for
ETH_P_TEB (will target net-next).
====================
gre: build header correctly for collect metadata tunnels
In ipgre (i.e. not gretap) + collect metadata mode, the skb was assumed to
contain Ethernet header and was encapsulated as ETH_P_TEB. This is not the
case, the interface is ARPHRD_IPGRE and the protocol to be used for
encapsulation is skb->protocol.
gre: do not assign header_ops in collect metadata mode
In ipgre mode (i.e. not gretap) with collect metadata flag set, the tunnel
is incorrectly assumed to be mGRE in NBMA mode (see commit 6a5f44d7a048c).
This is not the case, we're controlling the encapsulation addresses by
lwtunnel metadata. And anyway, assigning dev->header_ops in collect metadata
mode does not make sense.
Although it would be more user firendly to reject requests that specify
both the collect metadata flag and a remote/local IP address, this would
break current users of gretap or introduce ugly code and differences in
handling ipgre and gretap configuration. Keep the current behavior of
remote/local IP address being ignored in such case.
v3: Back to v1, added explanation paragraph.
v2: Reject configuration specifying both remote/local address and collect
metadata flag.
Fixes: 2e15ea390e6f4 ("ip_gre: Add support to collect tunnel metadata.") Signed-off-by: Jiri Benc <[email protected]> Signed-off-by: David S. Miller <[email protected]>