We create a number of work structs to be queued up to a workqueue, and
on completion of the workqueue handler, the workqueue handler frees the
allocated memory. If, however, we don't queue the work struct because
the device is going down, then we need to free the memory ourselves.
IB/mlx4: Optimize freeing of items on error unwind
On failure, we loop through all possible pointers and test them before
calling kfree. But really, why even attempt to free items we didn't
allocate when we can easily loop through exactly and only the devices
for which the original memory allocation succeeded and free just those.
Or Gerlitz [Thu, 25 Jun 2015 14:45:38 +0000 (17:45 +0300)]
IB/mlx4: Fix use of flow-counters for process_mad
For IB links, reading HCA flow counters through iboe_process_mad() should
be used when mlx4_ib_process_mad() is invoked only for VFs PMA queries and
exactly nothing else.
Fixes: 7193a141eb74 ('IB/mlx4: Set VF to read from QP counters') Reported-by: Linus Torvalds <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
Vaishali Thakkar [Tue, 16 Jun 2015 11:43:05 +0000 (17:13 +0530)]
IB/ipath: Convert use of __constant_<foo> to <foo>
In little endian cases, the macros be16_to_cpu and cpu_to_be64
unfolds to __swab{16,64} which provides special case for constants.
In big endian cases, __constant_be16_to_cpu and be16_to_cpu
expand directly to the same expression. The same applies for
__constant_cpu_to_be64 and cpu_to_be64.
So, replace __constant_be16_to_cpu with be16_to_cpu and
__constant_cpu_to_be64 with cpu_to_be64, with the goal of getting
rid of the definition of __constant_be16_to_cpu and
__constant_cpu_to_be64 completely.
IB/ipoib: Scatter-Gather support in connected mode
By default, IPoIB-CM driver uses 64k MTU. Larger MTU gives better
performance.
This MTU plus overhead puts the memory allocation for IP based packets at
32 4k pages (order 5), which have to be contiguous.
When the system memory under pressure, it was observed that allocating 128k
contiguous physical memory is difficult and causes serious errors (such as
system becomes unusable).
This enhancement resolve the issue by removing the physically contiguous
memory requirement using Scatter/Gather feature that exists in Linux stack.
With this fix Scatter-Gather will be supported also in connected mode.
This change reverts some of the change made in commit e112373fd6aa
("IPoIB/cm: Reduce connected mode TX object size").
The ability to use SG in IPoIB CM is possible because the coupling
between NETIF_F_SG and NETIF_F_CSUM was removed in commit ec5f06156423 ("net: Kill link between CSUM and SG features.")
Haggai Eran [Tue, 7 Jul 2015 14:45:13 +0000 (17:45 +0300)]
IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush
__ipoib_ib_dev_flush calls itself recursively on child devices, and lockdep
complains about locking vlan_rwsem twice (see below). Use down_read_nested
instead of down_read to prevent the warning.
=============================================
[ INFO: possible recursive locking detected ]
4.1.0-rc4+ #36 Tainted: G O
---------------------------------------------
kworker/u20:2/261 is trying to acquire lock:
(&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
but task is already holding lock:
(&priv->vlan_rwsem){.+.+..}, at: [<ffffffffa0791e2a>] __ipoib_ib_dev_flush+0x3a/0x2b0 [ib_ipoib]
other info that might help us debug this:
Possible unsafe locking scenario:
Haggai Eran [Tue, 7 Jul 2015 14:45:12 +0000 (17:45 +0300)]
IB/ucma: Fix lockdep warning in ucma_lock_files
The ucma_lock_files() locks the mut mutex on two files, e.g. for migrating
an ID. Use mutex_lock_nested() to prevent the warning below.
=============================================
[ INFO: possible recursive locking detected ]
4.1.0-rc6-hmm+ #40 Tainted: G O
---------------------------------------------
pingpong_rpc_se/10260 is trying to acquire lock:
(&file->mut){+.+.+.}, at: [<ffffffffa047ac55>] ucma_migrate_id+0xc5/0x248 [rdma_ucm]
but task is already holding lock:
(&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&file->mut);
lock(&file->mut);
*** DEADLOCK ***
May be due to missing lock nesting notation
1 lock held by pingpong_rpc_se/10260:
#0: (&file->mut){+.+.+.}, at: [<ffffffffa047ac4b>] ucma_migrate_id+0xbb/0x248 [rdma_ucm]
Wengang Wang [Mon, 6 Jul 2015 06:35:11 +0000 (14:35 +0800)]
rds: rds_ib_device.refcount overflow
Fixes: 3e0249f9c05c ("RDS/IB: add refcount tracking to struct rds_ib_device")
There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr
failed(mr pool running out). this lead to the refcount overflow.
A complain in line 117(see following) is seen. From vmcore:
s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448.
That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely
to return ERR_PTR(-EAGAIN).
RDMA/core: Fixes for port mapper client registration
Fixes to allow clients to make remove mapping requests, after
they have provided the user space service with the mapping
information, they are using when the service is restarted.
1) Adding IWPM_REG_VALID, IWPM_REG_INCOMPL and IWPM_REG_UNDEF
registration types for the port mapper clients and functions
to set/check the registration type.
2) If the port mapper user space service is not available to register
the client, then its registration stays IWPM_REG_UNDEF and the
registration isn't checked until the service becomes available
(no mappings are possible, if the user space service isn't running).
3) After the service is restarted, the user space port mapper pid is set
to valid and the client registration is set to IWPM_REG_INCOMPL
to allow the client to make remove mapping requests.
Amir Vadai [Wed, 1 Jul 2015 11:31:01 +0000 (14:31 +0300)]
IB/IPoIB: Fix bad error flow in ipoib_add_port()
Error values of ib_query_port() and ib_query_device() weren't propagated
correctly. Because of that, ipoib_add_port() could return NULL value,
which escaped the IS_ERR() check in ipoib_add_one() and we crashed.
IB/mlx4: Do not attemp to report HCA clock offset on VFs
mlx4 VFs can provide CQE raw time-stamping services, but they
don't have the hca core clock mapped to their PCI bars.
As such, we should not attempt to query and report the clock offset
to user space for VFs. Doing so causes query_device over VFs to fail
with -ENOSUPP.
Fixes: 4b664c4355b2 ('IB/mlx4: Add support for CQ time-stamping') Signed-off-by: Matan Barak <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: Doug Ledford <[email protected]>
Erez Shitrit [Thu, 25 Jun 2015 14:13:22 +0000 (17:13 +0300)]
IB/cm: Do not queue work to a device that's going away
Whenever ib_cm gets remove_one call, like when there is a hot-unplug
event, the driver should mark itself as going_down and confirm that no
new works are going to be queued for that device.
so, the order of the actions are:
1. mark the going_down bit.
2. flush the wq.
3. [make sure no new works for that device.]
4. unregister mad agent.
otherwise, works that are already queued can be scheduled after the mad
agent was freed.
Vaishali Thakkar [Wed, 24 Jun 2015 04:42:13 +0000 (10:12 +0530)]
IB/srpt: Convert use of __constant_cpu_to_beXX to cpu_to_beXX
In little endian cases, the macro cpu_to_be{16,32,64} unfolds to
__swab{16,32,64} which provides special case for constants. In
big endian cases, __constant_cpu_to_be{16,32,64} and
cpu_to_be{16,32,64} expand directly to the same expression. So,
replace __constant_cpu_to_be{16,32,64} with cpu_to_be{16,32,64}
with the goal of getting rid of the definitions of
__constant_cpu_to_be{16,32,64} completely.
The Coccinelle semantic patch that performs this transformation
is as follows:
Hal Rosenstock [Mon, 29 Jun 2015 13:57:00 +0000 (09:57 -0400)]
IB: Add rdma_cap_ib_switch helper and use where appropriate
Persuant to Liran's comments on node_type on linux-rdma
mailing list:
In an effort to reform the RDMA core and ULPs to minimize use of
node_type in struct ib_device, an additional bit is added to
struct ib_device for is_switch (IB switch). This is needed
to be initialized by any IB switch device driver. This is a
NEW requirement on such device drivers which are all
"out of tree".
In addition, an ib_switch helper was added to ib_verbs.h
based on the is_switch device bit rather than node_type
(although those should be consistent).
The RDMA core (MAD, SMI, agent, sa_query, multicast, sysfs)
as well as (IPoIB and SRP) ULPs are updated where
appropriate to use this new helper. In some cases,
the helper is now used under the covers of using
rdma_[start end]_port rather than the open coding
previously used.
Filipe Manana [Tue, 14 Jul 2015 15:09:39 +0000 (16:09 +0100)]
Btrfs: fix file corruption after cloning inline extents
Using the clone ioctl (or extent_same ioctl, which calls the same extent
cloning function as well) we end up allowing copy an inline extent from
the source file into a non-zero offset of the destination file. This is
something not expected and that the btrfs code is not prepared to deal
with - all inline extents must be at a file offset equals to 0.
For example, the following excerpt of a test case for fstests triggers
a crash/BUG_ON() on a write operation after an inline extent is cloned
into a non-zero offset:
_scratch_mkfs >>$seqres.full 2>&1
_scratch_mount
# Create our test files. File foo has the same 2K of data at offset 4K
# as file bar has at its offset 0.
$XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 4K" \
-c "pwrite -S 0xbb 4k 2K" \
-c "pwrite -S 0xcc 8K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# File bar consists of a single inline extent (2K size).
$XFS_IO_PROG -f -s -c "pwrite -S 0xbb 0 2K" \
$SCRATCH_MNT/bar | _filter_xfs_io
# Now call the clone ioctl to clone the extent of file bar into file
# foo at its offset 4K. This made file foo have an inline extent at
# offset 4K, something which the btrfs code can not deal with in future
# IO operations because all inline extents are supposed to start at an
# offset of 0, resulting in all sorts of chaos.
# So here we validate that clone ioctl returns an EOPNOTSUPP, which is
# what it returns for other cases dealing with inlined extents.
$CLONER_PROG -s 0 -d $((4 * 1024)) -l $((2 * 1024)) \
$SCRATCH_MNT/bar $SCRATCH_MNT/foo
# Because of the inline extent at offset 4K, the following write made
# the kernel crash with a BUG_ON().
$XFS_IO_PROG -c "pwrite -S 0xdd 6K 2K" $SCRATCH_MNT/foo | _filter_xfs_io
status=0
exit
The stack trace of the BUG_ON() triggered by the last write is:
Fix this by returning the error EOPNOTSUPP if an attempt to copy an
inline extent into a non-zero offset happens, just like what is done for
other scenarios that would require copying/splitting inline extents,
which were introduced by the following commits:
00fdf13a2e9f ("Btrfs: fix a crash of clone with inline extents's split") 3f9e3df8da3c ("btrfs: replace error code from btrfs_drop_extents")
Paul Burton [Fri, 10 Jul 2015 15:00:24 +0000 (16:00 +0100)]
MIPS: Require O32 FP64 support for MIPS64 with O32 compat
MIPS32r6 code requires FP64 (ie. FR=1) support. Building a kernel with
support for MIPS32r6 binaries but without support for O32 with FP64 is
therefore a problem which can lead to incorrectly executed userland.
CONFIG_MIPS_O32_FP64_SUPPORT is already selected when the kernel is
configured for MIPS32r6, but not when the kernel is configured for
MIPS64r6 with O32 compat support. Select CONFIG_MIPS_O32_FP64_SUPPORT in
such configurations to prevent building kernels which execute MIPS32r6
userland incorrectly.
ALSA: line6: Fix -EBUSY error during active monitoring
When a monitor stream is active, the next PCM stream access results in
EBUSY error because of the check in line6_stream_start(). Fix this by
just skipping the submission of pending URBs when the stream is
already running instead.
It breaks existing old userspace which doesn't handle UNKNOWN
swizzling correct. Yes UNKNOWN was a thing back in 2009 and probably
still is on some other platforms, but it still pretty clearly broke
the testers machine. If we want this we need to extend the ioctl with
new paramters that only new userspace looks at.
Olof Johansson [Tue, 14 Jul 2015 09:16:55 +0000 (11:16 +0200)]
Merge tag 'socfpga_fixes_for_v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into fixes
Merge "SoCFPGA fixes for v4.2-rc1" from Dinh Nguyen:
SoCFPGA fixes against v4.2-rc1
- Update compatible "adxl345x" compatible string
- Alphabetize the DTS nodes for the C5 sockit board file
* tag 'socfpga_fixes_for_v4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
ARM: socfpga: dts: Fix entries order
ARM: socfpga: dts: Fix adxl34x formating and compatible string
Ux500 is regressing due to commit a21763a0b1e5a5ab8310f581886d04beadc16616
"pinctrl: nomadik: activate strict mux mode" which disallows
Nomadik GPIO 5 to be muxed in as a level shifter voltage select
pin, as it is currently described as being used for RX on UART1.
The behaviour is correct, instead the hardware config has been
incorrecly specified: UART1 is indeed unused on HREFv60plus and
Snowball and that is why HREFv60plus can use the pins it would
normally occupy as the voltage select line for the MMC/SD
levelshifter (Snowball uses it for I2C4).
The reason UART1 was anyway enabled on these platforms was
probably to secure the port enumeration to userspace. This
can be solved by using aliases (done in a separate patch) so
we can now deactivate UART1 and let MMC/SD use it properly
on HREFv60plus. We explicitly activate it only for the
older HREFprev60 board.
To complete, we set up the pin configuration for these pins
properly in the sdi0 node.
This enumerates the PL011 serial ports on the Ux500. This is
necessary to do if we want to remove one of the serial ports,
since userspace depends on console to be present on ttyAMA2
and we must not break userspace.
drm/i915: Forward all core DRM ioctls to core compat handling
Previously only core DRM ioctls under the DRM_COMMAND_BASE were being
forwarded, but the drm.h header suggests (and reality confirms) ones
after (and including) DRM_COMMAND_END should be forwarded as well.
We need this to correctly forward the compat ioctl for the botched-up
addfb2.1 extension.
Alex Gartrell [Sun, 5 Jul 2015 21:28:26 +0000 (14:28 -0700)]
ipvs: skb_orphan in case of forwarding
It is possible that we bind against a local socket in early_demux when we
are actually going to want to forward it. In this case, the socket serves
no purpose and only serves to confuse things (particularly functions which
implicitly expect sk_fullsock to be true, like ip_local_out).
Additionally, skb_set_owner_w is totally broken for non full-socks.
Julian Anastasov [Mon, 29 Jun 2015 18:51:40 +0000 (21:51 +0300)]
ipvs: fix crash if scheduler is changed
I overlooked the svc->sched_data usage from schedulers
when the services were converted to RCU in 3.10. Now
the rare ipvsadm -E command can change the scheduler
but due to the reverse order of ip_vs_bind_scheduler
and ip_vs_unbind_scheduler we provide new sched_data
to the old scheduler resulting in a crash.
To fix it without changing the scheduler methods we
have to use synchronize_rcu() only for the editing case.
It means all svc->scheduler readers should expect a
NULL value. To avoid breakage for the service listing
and ipvsadm -R we can use the "none" name to indicate
that scheduler is not assigned, a state when we drop
new connections.
Reported-by: Alexander Vasiliev <[email protected]> Fixes: ceec4c381681 ("ipvs: convert services to rcu") Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
Julian Anastasov [Sat, 27 Jun 2015 11:39:30 +0000 (14:39 +0300)]
ipvs: do not use random local source address for tunnels
Michael Vallaly reports about wrong source address used
in rare cases for tunneled traffic. Looks like
__ip_vs_get_out_rt in 3.10+ is providing uninitialized
dest_dst->dst_saddr.ip because ip_vs_dest_dst_alloc uses
kmalloc. While we retry after seeing EINVAL from routing
for data that does not look like valid local address, it
still succeeded when this memory was previously used from
other dests and with different local addresses. As result,
we can use valid local address that is not suitable for
our real server.
Fix it by providing 0.0.0.0 every time our cache is refreshed.
By this way we will get preferred source address from routing.
Reported-by: Michael Vallaly <[email protected]> Fixes: 026ace060dfe ("ipvs: optimize dst usage for real server") Signed-off-by: Julian Anastasov <[email protected]> Signed-off-by: Simon Horman <[email protected]>
Arun Bharadwaj [Thu, 4 Jun 2015 18:08:19 +0000 (11:08 -0700)]
ARM: dts: Fix frequency scaling on Gumstix Pepper
The device tree for Gumstix Pepper has DCDC2 and
DCDC3 correctly labelled but the upper limit values
are wrong. The confusion is due to the hardware
quirk where the DCDC2 and DCDC3 wires are flipped
in Pepper.
Adam YH Lee [Thu, 9 Jul 2015 22:37:57 +0000 (15:37 -0700)]
ARM: dts: configure regulators for Gumstix Pepper
Boot process is halting in midway because some of the necessary voltage
regulators are deemed unused and subsequently powered off, leading to
a completely unresponsive system.
Most of the device nodes had correct voltage regulator attachments.
Yet these nodes had to set stricter enforcement on them through
'regulator-boot-on' and 'regulator-always-on' to function correctly.
The consumers of the regulators this commit affect are the followings:
DCDC1: vdd_1v8 system supply, USB-PHY, and ADC
DCDC2: Core domain
DCDC3: MPU core domain
LDO1: RTC
LDO2: 3v3 IO domain
LDO3: USB-PHY; not a boot-time requirement
LDO4: LCD [16:23]
All but LDO3 need to be always-on for the system to be functional.
Additionally regulator-name properties have been added for the kernel to
display the name from the schematic. This will improve diagnostics.
Adam YH Lee [Fri, 12 Jun 2015 20:37:22 +0000 (13:37 -0700)]
ARM: dts: omap3: overo: Update LCD panel names
For Gumstix Overo COMs, the u-boot bootloader typically passes an
argument specifying the default display via the omapdss.def_disp
parameter. When a default display is specified, DSS2 tries to match
this name with either the device tree label (e.g. label=dvi) or,
failing this, the device tree alias (e.g. label=display0). Update the
panel names for the 'lcd43' and 'lcd35' displays in the device tree
such that they match the names passed by u-boot.
Jason Gunthorpe [Tue, 30 Jun 2015 19:15:31 +0000 (13:15 -0600)]
tpm: Fix initialization of the cdev
When a cdev is contained in a dynamic structure the cdev parent kobj
should be set to the kobj that controls the lifetime of the enclosing
structure. In TPM's case this is the embedded struct device.
Also, cdev_init 0's the whole structure, so all sets must be after,
not before. This fixes module ref counting and cdev.
Thomas Gleixner [Mon, 13 Jul 2015 21:22:44 +0000 (23:22 +0200)]
gpio/davinci: Fix race in installing chained irq handler
Fix a race where a pending interrupt could be received and the handler
called before the handler's data has been setup, by converting to
irq_set_chained_handler_and_data().
Merge tag 'iio-fixes-for-4.2b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Second set of IIO fixes for the 4.2 cycle. Note these depend (mostly) on
material in the recent merge window, hence their separation from set (a)
as the fixes-togreg branch predated the merge window. I am running rather
later with these than I would have liked hence the large set.
* stk3310 fixes from Hartmut's review that came in post merge
- fix direction of proximity inline with recent documentation
clarification.
- fix missing REGMAP_I2C dependency
- rework the error handling for raw readings to fix an failure to power
down in the event of a raw reading failing.
- fix a bug in the compensation code which was toggling an extra bit in the
register.
* mmc35240 - reported samplign frequencies were wrong.
* ltr501 fixes
- fix a case of returning the return value of a regmap_read instead of
the value read.
- fix missing regmap dependency
* sx9500 - fix missing default values for ret in a couple of places to handle
the case of no enabled channels.
* tmp006 - check that writes to info_mask elements are actually to writable
ones. Otherwise, writing to any of them will change the sampling frequency.
Merge tag 'iio-fixes-for-4.2a' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
First set of IIO fixes for the 4.2 cycle.
* Fix a regression in hid sensors suspend time as a result of adding runtime
pm. The normal flow of waking up devices in order to go into suspend
(given the devices are normally suspended when not reading) to a regression
in suspend time on some laptops (reports of an additional 8 seconds).
Fix this by checking to see if a user action resulting in the wake up, and
make it a null operation if it didn't. Note that for hid sensors, there is
nothing useful to be done when moving into a full suspend from a runtime
suspend so they might as well be left alone.
* rochip_saradc: fix some missing MODULE_* data including the licence so that
the driver does not taint the kernel incorrectly and can build as a module.
* twl4030 - mark irq as oneshot as it always should have been.
* inv-mpu - write formats for attributes not specified, leading to miss
interpretation of the gyro scale channel when written.
* Proximity ABI clarification. This had snuck through as a mess. Some
drivers thought proximity went in one direction, some the other. We went
with the most common option, documented it and fixed up the drivers going
the other way. Fix for sx9500 included in this set.
* ad624r - fix a wrong shift in the output data.
* at91_adc - remove a false limit on the value of the STARTUP register
applied by too small a type for the device tree parameter.
* cm3323 - clear the bits when setting the integration time (otherwise
we can only ever set more bits in the relevant field).
* bmc150-accel - multiple triggers are registered, but on error were not being
unwound in the opposite order leading to removal of triggers that had not
yet successfully been registered (count down instead of up when unwinding).
* tcs3414 - ensure right part of val / val2 pair read so that the integration
time is not always 0.
* cc10001_adc - bug in kconfig dependency. Use of OR when AND was intended.
Daniel Vetter [Mon, 13 Jul 2015 06:22:22 +0000 (08:22 +0200)]
drm/i915: fix oops in primary_check_plane
On Sun, Jul 12, 2015 at 09:52:51AM -0700, Linus Torvalds wrote:
> On Sun, Jul 12, 2015 at 1:03 AM, Jörg Otte <[email protected]> wrote:
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000009
> > IP: [<ffffffffbd3447bb>] 0xffffffffbd3447bb
>
> Ugh. Please enable KALLSYMS to get sane symbols.
>
> But yes, "crtc_state->base.active" is at offset 9 from "crtc_state",
> so it's pretty clearly just that change frm
>
> - if (intel_crtc->active) {
> + if (crtc_state->base.active) {
>
> and "crtc_state" is NULL.
>
> And the code very much knows that crtc_state can be NULL, since it's
> initialized with
>
> crtc_state = state->base.state ?
> intel_atomic_get_crtc_state(state->base.state,
> intel_crtc) : NULL;
>
> Tssk. Daniel? Should I just revert that commit dec4f799d0a4
> ("drm/i915: Use crtc_state->active in primary check_plane func") for
> now, or is there a better fix? Like just checking crtc_state for NULL?
Indeed embarrassing. I've missed that we still have 1 caller left that's
using the transitional helpers, and those don't fill out
plane_state->state backpointers to the global atomic update since there is
no global atomic update for transitional helpers. Below diff should fix
this - we need to preferentially check crts_state->active and if that's
not set intel_crtc->active should yield the right result for the one
remaining caller (it's in the crtc_disable paths).
Imre Deak [Wed, 8 Jul 2015 16:18:59 +0000 (19:18 +0300)]
drm/i915: remove unused has_dma_mapping flag
After the previous patch this flag will check always clear, as it's
never set for shmem backed and userptr objects, so we can remove it.
Signed-off-by: Imre Deak <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
[danvet: Yeah this isn't really fixes but it's a nice cleanup to
clarify the code but not really worth the hassle of backmerging. So
just add to -fixes, we're still early in -rc.] Signed-off-by: Daniel Vetter <[email protected]>
Imre Deak [Thu, 9 Jul 2015 09:59:05 +0000 (12:59 +0300)]
drm/i915: avoid leaking DMA mappings
We have 3 types of DMA mappings for GEM objects:
1. physically contiguous for stolen and for objects needing contiguous
memory
2. DMA-buf mappings imported via a DMA-buf attach operation
3. SG DMA mappings for shmem backed and userptr objects
For 1. and 2. the lifetime of the DMA mapping matches the lifetime of the
corresponding backing pages and so in practice we create/release the
mapping in the object's get_pages/put_pages callback.
For 3. the lifetime of the mapping matches that of any existing GPU binding
of the object, so we'll create the mapping when the object is bound to
the first vma and release the mapping when the object is unbound from its
last vma.
Since the object can be bound to multiple vmas, we can end up creating a
new DMA mapping in the 3. case even if the object already had one. This
is not allowed by the DMA API and can lead to leaked mapping data and
IOMMU memory space starvation in certain cases. For example HW IOMMU
drivers (intel_iommu) allocate a new range from their memory space
whenever a mapping is created, silently overriding a pre-existing
mapping.
Fix this by moving the creation/removal of DMA mappings to the object's
get_pages/put_pages callbacks. These callbacks already check for and do
an early return in case of any nested calls. This way objects of the 3.
case also become more like the other object types.
I noticed this issue by enabling DMA debugging, which got disabled after
a while due to its internal mapping tables getting full. It also reported
errors in connection to random other drivers that did a DMA mapping for
an address that was previously mapped by i915 but was never released.
Besides these diagnostic messages and the memory space starvation
problem for IOMMUs, I'm not aware of this causing a real issue.
The fix is based on a patch from Chris.
v2:
- move the DMA mapping create/remove calls to the get_pages/put_pages
callbacks instead of adding new callbacks for these (Chris)
v3:
- also fix the get_page cache logic on the userptr async path (Chris)
Tomas Elf [Thu, 9 Jul 2015 14:30:57 +0000 (15:30 +0100)]
drm/i915: Snapshot seqno of most recently submitted request.
The hang checker needs to inspect whether or not the ring request list is empty
as well as if the given engine has reached or passed the most recently
submitted request. The problem with this is that the hang checker cannot grab
the struct_mutex, which is required in order to safely inspect requests since
requests might be deallocated during inspection. In the past we've had kernel
panics due to this very unsynchronized access in the hang checker.
One solution to this problem is to not inspect the requests directly since
we're only interested in the seqno of the most recently submitted request - not
the request itself. Instead the seqno of the most recently submitted request is
stored separately, which the hang checker then inspects, circumventing the
issue of synchronization from the hang checker entirely.
drm/i915: Convert 'ring_idle()' to use requests not seqnos
v2 (Chris Wilson):
- Pass current engine seqno to ring_idle() from i915_hangcheck_elapsed() rather
than compute it over again.
- Remove extra whitespace.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90112#c23 Signed-off-by: Chris Wilson <[email protected]> Signed-off-by: Daniel Vetter <[email protected]>
perf report: Merge al->filtered with hist_entry->filtered
We stopped dropping samples for things filtered via the --comms, --dsos,
--symbols, etc, i.e. things marked as filtered in the symbol resolution
routines (thread__find_addr_map(), perf_event__preprocess_sample(),
etc).
perf top/tui: Update nr_entries properly after a filter is applied
We don't take into account entries that were filtered in
perf_event__preprocess_sample() and friends, which leads to
inconsistency in the browser seek routines, that expects the number of
hist_entry->filtered entries to match what it thinks is the number of
unfiltered, browsable entries.
So, for instance, when we do:
perf top --symbols ___non_existent_symbol___
the hist_browser__nr_entries() routine thinks there are no filters in
place, uses the hists->nr_entries but all entries are filtered, leading
to a segfault.
Tested with:
perf top --symbols malloc,free --percentage=relative
Freezing, by pressing 'f', at any time and doing the math on the
percentages ends up with 100%, ditto for:
perf top --dsos libpthread-2.20.so,libxul.so --percentage=relative
Both were segfaulting, all fixed now.
More work needed to do away with checking if filters are in place, we
should just use the nr_non_filtered_samples counter, no need to
conditionally use it or hists.nr_filter, as what the browser does is
just show unfiltered stuff. An audit of how it is being accounted is
needed, this is the minimal fix.
1) Missing list head init in bluetooth hidp session creation, from Tedd
Ho-Jeong An.
2) Don't leak SKB in bridge netfilter error paths, from Florian
Westphal.
3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien
Grall.
4) Fix regression in IP over hamradio bpq encapsulation, from Ralf
Baechle.
5) Fix race between rhashtable resize events and table walks, from Phil
Sutter.
6) Missing validation of IFLA_VF_INFO netlink attributes, fix from
Daniel Borkmann.
7) Missing security layer socket state initialization in tipc code,
from Stephen Smalley.
8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from
Denys Vlasenko.
9) Missing minor_idr destroy on module unload on macvtap driver, from
Johannes Thumshirn.
10) Various pktgen kernel thread races, from Oleg Nesterov.
11) Fix races that can cause packets to be processed in the backlog even
after a device attached to that SKB has been fully unregistered.
From Julian Anastasov.
12) bcmgenet driver doesn't account packet drops vs. errors properly,
fix from Petri Gynther.
13) Array index validation and off by one fix in DSA layer from Florian
Fainelli
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits)
can: replace timestamp as unique skb attribute
ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux
can: c_can: Fix default pinmux glitch at init
can: rcar_can: unify error messages
can: rcar_can: print request_irq() error code
can: rcar_can: fix typo in error message
can: rcar_can: print signed IRQ #
can: rcar_can: fix IRQ check
net: dsa: Fix off-by-one in switch address parsing
net: dsa: Test array index before use
net: switchdev: don't abort unsupported operations
net: bcmgenet: fix accounting of packet drops vs errors
cdc_ncm: update specs URL
Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html
net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets
bridge: mdb: allow the user to delete mdb entry if there's a querier
net: call rcu_read_lock early in process_backlog
net: do not process device backlog during unregistration
bridge: fix potential crash in __netdev_pick_tx()
net: axienet: Fix devm_ioremap_resource return value check
...
Pull crypto fixes from Herbert Xu:
"This fixes a duplicate dma_unmap_sg call in omap-des and reentrancy
bugs in the powerpc nx driver which may cause bogus output or worse
memory corruption"
dm: fix use after free crash due to incorrect cleanup sequence
Linux 4.2-rc1 Commit 0f20972f7bf6 ("dm: factor out a common
cleanup_mapped_device()") moved a common cleanup code to a separate
function. Unfortunately, that commit incorrectly changed the order of
cleanup, so that it destroys the mapped_device's srcu structure
'io_barrier' before destroying its workqueue.
The function that is executed on the workqueue (dm_wq_work) uses the srcu
structure, thus it may use it after being freed. That results in a
crash in the LVM test suite's mirror-vgreduce-removemissing.sh test.
Signed-off-by: Mikulas Patocka <[email protected]> Fixes: 0f20972f7bf6 ("dm: factor out a common cleanup_mapped_device()") Signed-off-by: Mike Snitzer <[email protected]>
When setting yup the symbols library we setup several filter lists,
for dsos, comms, symbols, etc, and there is code that, if there are
filters, do certain operations, like recalculate the number of non
filtered histogram entries in the top/report TUI.
But they were considering just the "Zoom" filters, when they need to
take into account as well the above mentioned filters (perf top --comms,
--dsos, etc).
So store in symbol_conf.has_filter true if any of those filters is in
place.
Jeff Layton [Sat, 11 Jul 2015 10:43:03 +0000 (06:43 -0400)]
nfs4: have do_vfs_lock take an inode pointer
Now that we have file locking helpers that can deal with an inode
instead of a filp, we can change the NFSv4 locking code to use that
instead.
This should fix the case where we have a filp that is closed while flock
or OFD locks are set on it, and the task is signaled so that it doesn't
wait for the LOCKU reply to come in before the filp is freed. At that
point we can end up with a use-after-free with the current code, which
relies on dereferencing the fl_file in the lock request.
Jeff Layton [Sat, 11 Jul 2015 10:43:02 +0000 (06:43 -0400)]
locks: have flock_lock_file take an inode pointer instead of a filp
...and rename it to better describe how it works.
In order to fix a use-after-free in NFS, we need to be able to remove
locks from an inode after the filp associated with them may have already
been freed. flock_lock_file already only dereferences the filp to get to
the inode, so just change it so the callers do that.
All of the callers already pass in a lock request that has the fl_file
set properly, so we don't need to pass it in individually. With that
change it now only dereferences the filp to get to the inode, so just
push that out to the callers.
William reported that he was seeing instability with this patch, which
is likely due to the fact that it can cause the kernel to take a new
reference to a filp after the last reference has already been put.
Revert this patch for now, as we'll need to fix this in another way.
If a machine check happens, the machine has the vector facility installed
and the extended save area exists, the cpu will save vector register
contents into the extended save area. This is regardless of control
register 0 contents, which enables and disables the vector facility during
runtime.
On each machine check we should validate the vector registers. The current
code however tries to validate the registers only if the running task is
using vector registers in user space.
However even the current code is broken and causes vector register
corruption on machine checks, if user space uses them:
the prefix area contains a pointer (absolute address) to the machine check
extended save area. In order to save some space the save area was put into
an unused area of the second prefix page.
When validating vector register contents the code uses the absolute address
of the extended save area, which is wrong. Due to prefixing the vector
instructions will then access contents using absolute addresses instead
of real addresses, where the machine stored the contents.
If the above would work there is still the problem that register validition
would only happen if user space uses vector registers. If kernel space uses
them also, this may also lead to vector register content corruption:
if the kernel makes use of vector instructions, but the current running
user space context does not, the machine check handler will validate
floating point registers instead of vector registers.
Given the fact that writing to a floating point register may change the
upper halve of the corresponding vector register, we also experience vector
register corruption in this case.
Fix all of these issues, and always validate vector registers on each
machine check, if the machine has the vector facility installed and the
extended save area is defined.
The sfpc inline assembly within execve_tail() may incorrectly set bits
28-31 of the sfpc instruction to a value which is not zero.
These bits however are currently unused and therefore should be zero
so we won't get surprised if these bits will be used in the future.
Therefore remove the second operand from the inline assembly.
Stefan Haberland [Fri, 10 Jul 2015 08:47:09 +0000 (10:47 +0200)]
s390/dasd: fix kernel panic when alias is set offline
The dasd device driver selects which (alias or base) device is used
for a given requests when the request is build. If the chosen alias
device is set offline before the request gets queued to the device
queue the starting function may use device structures that are
already freed. This might lead to a hanging offline process or a
kernel panic.
Add a check to the starting function that returns the request to the
upper layer if the device is already in offline processing.
In addition to that prevent that an alias device that's already in
offline processing gets chosen as start device.
ARC: make sure instruction_pointer() returns unsigned value
Currently instruction_pointer() returns pt_regs->ret and so return value
is of type "long", which implicitly stands for "signed long".
While that's perfectly fine when dealing with 32-bit values if return
value of instruction_pointer() gets assigned to 64-bit variable sign
extension may happen.
And at least in one real use-case it happens already.
In perf_prepare_sample() return value of perf_instruction_pointer()
(which is an alias to instruction_pointer() in case of ARC) is assigned
to (struct perf_sample_data)->ip (which type is "u64").
And what we see if instuction pointer points to user-space application
that in case of ARC lays below 0x8000_0000 "ip" gets set properly with
leading 32 zeros. But if instruction pointer points to kernel address
space that starts from 0x8000_0000 then "ip" is set with 32 leadig
"f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be
assigned with 0xffff_ffff__8100_0000. Which is obviously wrong.
In particular that issuse broke output of perf, because perf was unable
to associate addresses like 0xffff_ffff__8100_0000 with anything from
/proc/kallsyms.
That's what we used to see:
----------->8----------
6.27% ls [unknown] [k] 0xffffffff8046c5cc
2.96% ls libuClibc-0.9.34-git.so [.] memcpy
2.25% ls libuClibc-0.9.34-git.so [.] memset
1.66% ls [unknown] [k] 0xffffffff80666536
1.54% ls libuClibc-0.9.34-git.so [.] 0x000224d6
1.18% ls libuClibc-0.9.34-git.so [.] 0x00022472
----------->8----------
With that change perf output looks much better now:
----------->8----------
8.21% ls [kernel.kallsyms] [k] memset
3.52% ls libuClibc-0.9.34-git.so [.] memcpy
2.11% ls libuClibc-0.9.34-git.so [.] malloc
1.88% ls libuClibc-0.9.34-git.so [.] memset
1.64% ls [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.41% ls [kernel.kallsyms] [k] __d_lookup_rcu
----------->8----------
yao mark [Thu, 2 Jul 2015 07:07:31 +0000 (15:07 +0800)]
drm/rockchip: vop: remove hardware cursor window
hardware cursor windows only have some fixed size, and not support
width virtual, when move hardware cursor windows outside of left,
the display would be wrong, so this window can't for cursor now.
And Tag hardware cursor window as a overlay is wrong, will make
userspace wrong behaviour.
Daniel Kurtz [Tue, 7 Jul 2015 09:03:36 +0000 (17:03 +0800)]
drm/rockchip: use drm_gem_mmap helpers
Rather than (incompletely [0]) re-implementing drm_gem_mmap() and
drm_gem_mmap_obj() helpers, call them directly from the rockchip mmap
routines.
Once the core functions return successfully, the rockchip mmap routines
can still use dma_mmap_attrs() to simply mmap the entire buffer.
[0] Previously, we were performing the mmap() without first taking a
reference on the underlying gem buffer. This could leak ptes if the gem
object is destroyed while userspace is still holding the mapping.
Heiko Stübner [Tue, 2 Jun 2015 14:41:45 +0000 (16:41 +0200)]
drm/rockchip: only call drm_fb_helper_hotplug_event if fb_helper present
Add a check for the presence of fb_helper to rockchip_drm_output_poll_changed()
to only call drm_fb_helper_hotplug_event if there is actually a fb_helper
available. Without this check I see NULL pointer dereferences when the
hdmi hotplug irq fires before the fb_helper got initialized.
Tomasz Figa [Mon, 11 May 2015 10:55:39 +0000 (19:55 +0900)]
drm/rockchip: Add BGR formats to VOP
VOP can support BGR formats in all windows thanks to red/blue swap option
provided in WINx_CTRL0 registers. This patch enables support for
ABGR8888, XBGR8888, BGR888 and BGR565 formats by using this feature.
David S. Miller [Mon, 13 Jul 2015 05:24:01 +0000 (22:24 -0700)]
Merge tag 'linux-can-fixes-for-4.2-20150712' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2015-07-12
this is a pull request of 8 patchs for net/master.
Sergei Shtylyov contributes 5 patches for the rcar_can driver, fixing the IRQ
check and several info and error messages. There are two patches by J.D.
Schroeder and Roger Quadros for the c_can driver and dra7x-evm device tree,
which precent a glitch in the DCAN1 pinmux. Oliver Hartkopp provides a better
approach to make the CAN skbs unique, the timestamp is replaced by a counter.
====================
The inb/outb/... family of IO methods end up being multiply defined when
building PCI support for the ColdFire. Compiling gives this:
CC init/main.o
In file included from ./arch/m68k/include/asm/io.h:4:0,
from include/linux/bio.h:30,
from include/linux/blkdev.h:18,
from init/main.c:75:
./arch/m68k/include/asm/io_mm.h:420:0: warning: "inb" redefined
./arch/m68k/include/asm/io_mm.h:108:0: note: this is the location of the previous definition
...
The ColdFire/PCI case defines its own IO access methods, so no others
should be defined or used in this case. Conditionally disable other
definitions that clash with it.
It would be nice if we could support multiple ColdFire SoC types in a
single binary - but currently the code simply does not support it.
Change the SoC selection config options to be a choice instead of
individual selectable entries.
This fixes problems with building allnoconfig, and means that a sane
linux kernel is generated for a single ColdFire SoC type.
m68knommu: improve the clock configuration defaults
Create some intelligent default settings for each ColdFire SoC type
in the configuration entry for CONFIG_CLOCK_FREQ.
The ColdFire clock frequency is configurable at build time. There is a
lot of variation in the frequency of operation on specific ColdFire based
boards. But we can choose a default that matches the maximum frequency
of clock operation for a particular ColdFire part. That is typically
the most common clock setting.
m68knommu: force setting of CONFIG_CLOCK_FREQ for ColdFire
It is possible to disable the clock selection at configuration time,
but for ColdFire targets we always expect a clock frequency to be
selected. This results in the following compile time error:
CC arch/m68k/kernel/asm-offsets.s
In file included from ./arch/m68k/include/asm/timex.h:14:0,
from include/linux/timex.h:65,
from include/linux/sched.h:19,
from arch/m68k/kernel/asm-offsets.c:14:
./arch/m68k/include/asm/coldfire.h:25:2: error: #error "Don't know what your ColdFire CPU clock frequency is??"
Remove CONFIG_CLOCK_SELECT completely and always enable CONFIG_CLOCK_FREQ
for ColdFire.
So the change to test 'crtc_state->base.active' cannot possibly be
correct as-is.
There may be some other minimal fix (like just checking crtc_state for
NULL), but I'm just reverting it now for the rc2 release, and people
like Daniel Vetter who actually know this code will figure out what the
right solution is in the longer term.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
"Fixes for this cycle regression in overlayfs and a couple of
long-standing (== all the way back to 2.6.12, at least) bugs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
freeing unlinked file indefinitely delayed
fix a braino in ovl_d_select_inode()
9p: don't leave a half-initialized inode sitting around
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS fixes from Ralf Baechle:
"A fair number of 4.2 fixes also because Markos opened the flood gates.
- Patch up the math used calculate the location for the page bitmap.
- The FDC (Not what you think, FDC stands for Fast Debug Channel) IRQ
around was causing issues on non-Malta platforms, so move the code
to a Malta specific location.
- A spelling fix replicated through several files.
- Fix to the emulation of an R2 instruction for R6 cores.
- Fix the JR emulation for R6.
- Further patching of mindless 64 bit issues.
- Ensure the kernel won't crash on CPUs with L2 caches with >= 8
ways.
- Use compat_sys_getsockopt for O32 ABI on 64 bit kernels.
- Fix cache flushing for multithreaded cores.
- A build fix"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: O32: Use compat_sys_getsockopt.
MIPS: c-r4k: Extend way_string array
MIPS: Pistachio: Support CDMM & Fast Debug Channel
MIPS: Malta: Make GIC FDC IRQ workaround Malta specific
MIPS: c-r4k: Fix cache flushing for MT cores
Revert "MIPS: Kconfig: Disable SMP/CPS for 64-bit"
MIPS: cps-vec: Use macros for various arithmetics and memory operations
MIPS: kernel: cps-vec: Replace KSEG0 with CKSEG0
MIPS: kernel: cps-vec: Use ta0-ta3 pseudo-registers for 64-bit
MIPS: kernel: cps-vec: Replace mips32r2 ISA level with mips64r2
MIPS: kernel: cps-vec: Replace 'la' macro with PTR_LA
MIPS: kernel: smp-cps: Fix 64-bit compatibility errors due to pointer casting
MIPS: Fix erroneous JR emulation for MIPS R6
MIPS: Fix branch emulation for BLTC and BGEC instructions
MIPS: kernel: traps: Fix broken indentation
MIPS: bootmem: Don't use memory holes for page bitmap
MIPS: O32: Do not handle require 32 bytes from the stack to be readable.
MIPS, CPUFREQ: Fix spelling of Institute.
MIPS: Lemote 2F: Fix build caused by recent mass rename.
Oliver Hartkopp [Fri, 26 Jun 2015 09:58:19 +0000 (11:58 +0200)]
can: replace timestamp as unique skb attribute
Commit 514ac99c64b "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.
Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245eb8 "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.
This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.
This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.
Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.
Roger Quadros [Tue, 7 Jul 2015 14:27:57 +0000 (17:27 +0300)]
ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux
Driver core sets "default" pinmux on on probe and CAN driver
sets "sleep" pinmux during register. This causes a small window
where the CAN pins are in "default" state with the DCAN module
being disabled.
Change the "default" state to be like sleep so this glitch is
avoided. Add a new "active" state that is used by the driver
when CAN is actually active.
The previous change 3973c526ae9c (net: can: c_can: Disable pins when CAN
interface is down) causes a slight glitch on the pinctrl settings when used.
Since commit ab78029 (drivers/pinctrl: grab default handles from device core),
the device core will automatically set the default pins. This causes the pins
to be momentarily set to the default and then to the sleep state in
register_c_can_dev(). By adding an optional "enable" state, boards can set the
default pin state to be disabled and avoid the glitch when the switch from
default to sleep first occurs. If the "enable" state is not available
c_can_pinctrl_select_state() falls back to using the "default" pinctrl state.
[Roger Q] - Forward port to v4.2 and use pinctrl_get_select().