linux.git
10 years agoRDMA/ocrdma: Always resolve destination mac from GRH for UD QPs
Devesh Sharma [Mon, 24 Nov 2014 04:33:39 +0000 (10:03 +0530)]
RDMA/ocrdma: Always resolve destination mac from GRH for UD QPs

For user applications that use UD QPs, always resolve destination MAC
from the GRH.  This is to avoid failure due to any garbage value in
the attr->dmac.

Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/ocrdma: Fix ocrdma_query_qp() to report q_key value for UD QPs
Mitesh Ahuja [Wed, 3 Dec 2014 06:06:33 +0000 (11:36 +0530)]
RDMA/ocrdma: Fix ocrdma_query_qp() to report q_key value for UD QPs

Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com>
Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agomlx4_core: Check for DPDP violation only when DPDP is not supported
Yuval Shaia [Sat, 13 Dec 2014 18:18:40 +0000 (10:18 -0800)]
mlx4_core: Check for DPDP violation only when DPDP is not supported

Move check for DPDP out of the loop to make the code more readable.

Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/mlx4: Fix an incorrectly shadowed variable in mlx4_ib_rereg_user_mr
Jack Morgenstein [Sun, 2 Nov 2014 08:23:19 +0000 (10:23 +0200)]
IB/mlx4: Fix an incorrectly shadowed variable in mlx4_ib_rereg_user_mr

This error was detected by sparse static checker:

    drivers/infiniband/hw/mlx4/mr.c:226:21: warning: symbol 'err' shadows an earlier one
    drivers/infiniband/hw/mlx4/mr.c:197:13: originally declared here

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Bump version to 1.5
Or Gerlitz [Sun, 7 Dec 2014 14:10:07 +0000 (16:10 +0200)]
IB/iser: Bump version to 1.5

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: DIX update
Sagi Grimberg [Sun, 7 Dec 2014 14:10:06 +0000 (16:10 +0200)]
IB/iser: DIX update

Following few recent Block integrity updates, we align the iSER data
integrity offload settings with:

- Deprecate pi_guard module param
- Expose support for DIX type 0.
- Use scsi_transfer_length for the transfer length
- Get pi_interval, ref_tag, ref_remap, bg_type and
  check_mask setting from scsi_cmnd

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Micro-optimize iser_handle_wc
Sagi Grimberg [Sun, 7 Dec 2014 14:10:05 +0000 (16:10 +0200)]
IB/iser: Micro-optimize iser_handle_wc

Use likely() for wc.status == IB_WC_SUCCESS

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Micro-optimize iser logging
Sagi Grimberg [Sun, 7 Dec 2014 14:10:04 +0000 (16:10 +0200)]
IB/iser: Micro-optimize iser logging

And fix a checkpatch warning.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Use more completion queues
Sagi Grimberg [Sun, 7 Dec 2014 14:10:03 +0000 (16:10 +0200)]
IB/iser: Use more completion queues

No reason to settle with four, can use the min between device max comp
vectors and number of cores.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Remove redundant is_mr indicator
Sagi Grimberg [Sun, 7 Dec 2014 14:10:02 +0000 (16:10 +0200)]
IB/iser: Remove redundant is_mr indicator

It is enough to check mem_h pointer assignment, mem_h == NULL will
indicate that buffer is not registered using mr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Centralize memory region invalidation to a function
Sagi Grimberg [Sun, 7 Dec 2014 14:10:01 +0000 (16:10 +0200)]
IB/iser: Centralize memory region invalidation to a function

Eliminates code duplication.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Terminate connection before cleaning inflight tasks
Sagi Grimberg [Sun, 7 Dec 2014 14:10:00 +0000 (16:10 +0200)]
IB/iser: Terminate connection before cleaning inflight tasks

When closing the connection, we should first terminate the connection
(in case it was not previously terminated) to guarantee the QP is in
error state and we are done with servicing IO. Only then go ahead with
tasks cleanup via iscsi_conn_stop.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Fix race between iser connection teardown and scsi TMFs
Sagi Grimberg [Sun, 7 Dec 2014 14:09:59 +0000 (16:09 +0200)]
IB/iser: Fix race between iser connection teardown and scsi TMFs

In certain scenarios (target kill with live IO) scsi TMFs may race
with iser RDMA teardown, which might cause NULL dereference on iser IB
device handle (which might have been freed). In this case we take a
conditional lock for TMFs and check the connection state (avoid
introducing lock contention in the IO path). This is indeed best
effort approach, but sufficient to survive multi targets sudden death
while heavy IO is inflight.

While we are on it, add a nice kernel-doc style documentation.

Reported-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Fix possible NULL derefernce ib_conn->device in session_create
Ariel Nahum [Sun, 7 Dec 2014 14:09:58 +0000 (16:09 +0200)]
IB/iser: Fix possible NULL derefernce ib_conn->device in session_create

If rdma_cm error event comes after ep_poll but before conn_bind, we
should protect against dereferncing the device (which may have been
terminated) in session_create and conn_create (already protected)
callbacks.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Fix sparse warnings
Sagi Grimberg [Sun, 7 Dec 2014 14:09:57 +0000 (16:09 +0200)]
IB/iser: Fix sparse warnings

Use uintptr_t to handle wr_id casting, which was found by Kbuild test
robot and smatch.  Also remove an internal definition of variable which
potentially shadows an external one (and make sparse happy).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Fix possible SQ overflow
Max Gurtovoy [Sun, 7 Dec 2014 14:09:56 +0000 (16:09 +0200)]
IB/iser: Fix possible SQ overflow

Fix a regression was introduced in commit 6df5a128f0fd ("IB/iser:
Suppress scsi command send completions").

The sig_count was wrongly set to be static variable, thus it is
possible that we won't reach to (sig_count % ISER_SIGNAL_BATCH) == 0
condition (due to races) and the send queue will be overflowed.

Instead keep sig_count per connection. We don't need it to be atomic
as we are safe under the iscsi session frwd_lock taken by libiscsi on
the queuecommand path.

Fixes: 6df5a128f0fd ("IB/iser: Suppress scsi command send completions")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Decrement CQ's active QPs accounting when QP creation fails
Sagi Grimberg [Sun, 7 Dec 2014 14:09:55 +0000 (16:09 +0200)]
IB/iser: Decrement CQ's active QPs accounting when QP creation fails

When creating a connection QP we choose the least used CQ and inc the
number of active QPs on that. If we fail to create the QP, we need to
decrement the active QPs counter.

Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Collapse cleanup and disconnect handlers
Ariel Nahum [Sun, 7 Dec 2014 14:09:54 +0000 (16:09 +0200)]
IB/iser: Collapse cleanup and disconnect handlers

No real need to wait for TIMEWAIT_EXIT before we destroy the RDMA
resources (also TIMEAWAIT_EXIT is not guarenteed to always arrive).  As
for the cma_id, only destroy it if the state is not DOWN where in this
case, conn_release is already running and we don't want to compete.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Fix catastrophic error flow hang
Sagi Grimberg [Sun, 7 Dec 2014 14:09:53 +0000 (16:09 +0200)]
IB/iser: Fix catastrophic error flow hang

In case of the HCA going into catasrophic error flow, the
beacon post_send is likely to fail, so surely there will
be no completion for it.

In this case, use a best effort approach and don't wait for beacon
completion if we failed to post the send.

Reported-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/iser: Re-adjust CQ and QP send ring sizes to HW limits
Minh Tran [Sun, 7 Dec 2014 14:09:52 +0000 (16:09 +0200)]
IB/iser: Re-adjust CQ and QP send ring sizes to HW limits

Re-adjust max CQEs per CQ and max send_wr per QP according
to the resource limits supported by underlying hardware.

Signed-off-by: Minh Tran <minhduc.tran@emulex.com>
Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIPoIB: No longer use flush as a parameter
Doug Ledford [Wed, 10 Dec 2014 16:47:05 +0000 (11:47 -0500)]
IPoIB: No longer use flush as a parameter

Various places in the IPoIB code had a deadlock related to flushing
the ipoib workqueue.  Now that we have per device workqueues and a
specific flush workqueue, there is no longer a deadlock issue with
flushing the device specific workqueues and we can do so unilaterally.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIPoIB: Make ipoib_mcast_stop_thread flush the workqueue
Doug Ledford [Wed, 10 Dec 2014 16:47:04 +0000 (11:47 -0500)]
IPoIB: Make ipoib_mcast_stop_thread flush the workqueue

We used to pass a flush variable to mcast_stop_thread to indicate if
we should flush the workqueue or not.  This was due to some code
trying to flush a workqueue that it was currently running on which is
a no-no.  Now that we have per-device work queues, and now that
ipoib_mcast_restart_task has taken the fact that it is queued on a
single thread workqueue with all of the ipoib_mcast_join_task's and
therefore has no need to stop the join task while it runs, we can do
away with the flush parameter and unilaterally flush always.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIPoIB: Use dedicated workqueues per interface
Doug Ledford [Wed, 10 Dec 2014 16:47:03 +0000 (11:47 -0500)]
IPoIB: Use dedicated workqueues per interface

During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up.  It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue, we can
have one device in the middle of a down and holding the rtnl lock and
another totally unrelated device needing to run mcast_restart_task,
which wants the rtnl lock and will loop trying to take it unless is
sees its own FLAG_ADMIN_UP flag go away.  Because the unrelated
interface will never see its own ADMIN_UP flag drop, the interface
going down will deadlock trying to flush the queue.  There are several
possible solutions to this problem:

Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail.  This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.

Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail.  Again, this can
drop work on the floor.  I suppose if our own ADMIN_UP flag doesn't go
away, then maybe after a few tries on the rtnl lock we can queue our
own task back up as a delayed work and return and avoid dropping work
on the floor that way.  But I'm not 100% convinced that we won't cause
other problems.

Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface.  In addition, keep the global
workqueue, but now limit it to only flush tasks.  In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIPoIB: change init sequence ordering
Doug Ledford [Wed, 10 Dec 2014 16:47:02 +0000 (11:47 -0500)]
IPoIB: change init sequence ordering

In preparation for using per device work queues, we need to move the
start of the neighbor thread task to after ipoib_ib_dev_init and move
the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
Otherwise we will end up freeing our workqueue with work possibly
still on it.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIPoIB: fix mcast_dev_flush/mcast_restart_task race
Doug Ledford [Wed, 10 Dec 2014 16:47:01 +0000 (11:47 -0500)]
IPoIB: fix mcast_dev_flush/mcast_restart_task race

Our mcast_dev_flush routine and our mcast_restart_task can race
against each other.  In particular, they both hold the priv->lock
while manipulating the rbtree and while removing mcast entries from
the multicast_list and while adding entries to the remove_list, but
they also both drop their locks prior to doing the actual removes.
The mcast_dev_flush routine is run entirely under the rtnl lock and so
has at least some locking.  The actual race condition is like this:

Thread 1                                Thread 2
ifconfig ib0 up
  start multicast join for broadcast
  multicast join completes for broadcast
  start to add more multicast joins
    call mcast_restart_task to add new entries
                                        ifconfig ib0 down
  mcast_dev_flush
    mcast_leave(mcast A)
    mcast_leave(mcast A)

As mcast_leave calls ib_sa_multicast_leave, and as member in
core/multicast.c is ref counted, we run into an unbalanced refcount
issue.  To avoid stomping on each others removes, take the rtnl lock
specifically when we are deleting the entries from the remove list.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIPoIB: fix MCAST_FLAG_BUSY usage
Doug Ledford [Wed, 10 Dec 2014 16:47:00 +0000 (11:47 -0500)]
IPoIB: fix MCAST_FLAG_BUSY usage

Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast
objects") added a new flag MCAST_JOIN_STARTED, but was not very strict
in how it was used.  We didn't always initialize the completion struct
before we set the flag, and we didn't always call complete on the
completion struct from all paths that complete it.  This made it less
than totally effective, and certainly made its use confusing.  And in
the flush function we would use the presence of this flag to signal
that we should wait on the completion struct, but we never cleared
this flag, ever.  This is further muddied by the fact that we overload
the MCAST_FLAG_BUSY flag to mean two different things: we have a join
in flight, and we have succeeded in getting an ib_sa_join_multicast.

In order to make things clearer and aid in resolving the rtnl deadlock
bug I've been chasing, I cleaned this up a bit.

 1) Remove the MCAST_JOIN_STARTED flag entirely
 2) Un-overload MCAST_FLAG_BUSY so it now only means a join is in-flight
 3) Test on mcast->mc directly to see if we have completed
    ib_sa_join_multicast (using IS_ERR_OR_NULL)
 4) Make sure that before setting MCAST_FLAG_BUSY we always initialize
    the mcast->done completion struct
 5) Make sure that before calling complete(&mcast->done), we always clear
    the MCAST_FLAG_BUSY bit
 6) Take the mcast_mutex before we call ib_sa_multicast_join and also
    take the mutex in our join callback.  This forces
    ib_sa_multicast_join to return and set mcast->mc before we process
    the callback.  This way, our callback can safely clear mcast->mc
    if there is an error on the join and we will do the right thing as
    a result in mcast_dev_flush.
 7) Because we need the mutex to synchronize mcast->mc, we can no
    longer call mcast_sendonly_join directly from mcast_send and
    instead must add sendonly join processing to the mcast_join_task

A number of different races are resolved with these changes.  These
races existed with the old MCAST_FLAG_BUSY usage, the
MCAST_JOIN_STARTED flag was an attempt to address them, and while it
helped, a determined effort could still trip things up.

One race looks something like this:

Thread 1                             Thread 2
ib_sa_join_multicast (as part of running restart mcast task)
  alloc member
  call callback
                                     ifconfig ib0 down
     wait_for_completion
    callback call completes
                                     wait_for_completion in
     mcast_dev_flush completes
       mcast->mc is PTR_ERR_OR_NULL
       so we skip ib_sa_leave_multicast
    return from callback
  return from ib_sa_join_multicast
set mcast->mc = return from ib_sa_multicast

We now have a permanently unbalanced join/leave issue that trips up the
refcounting in core/multicast.c

Another like this:

Thread 1                   Thread 2         Thread 3
ib_sa_multicast_join
                                            ifconfig ib0 down
    priv->broadcast = NULL
                           join_complete
                    wait_for_completion
   mcast->mc is not yet set, so don't clear
return from ib_sa_join_multicast and set mcast->mc
   complete
   return -EAGAIN (making mcast->mc invalid)
        call ib_sa_multicast_leave
    on invalid mcast->mc, hang
    forever

By holding the mutex around ib_sa_multicast_join and taking the mutex
early in the callback, we force mcast->mc to be valid at the time we
run the callback.  This allows us to clear mcast->mc if there is an
error and the join is going to fail.  We do this before we complete
the mcast.  In this way, mcast_dev_flush always sees consistent state
in regards to mcast->mc membership at the time that the
wait_for_completion() returns.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIPoIB: Make the carrier_on_task race aware
Doug Ledford [Wed, 10 Dec 2014 16:46:59 +0000 (11:46 -0500)]
IPoIB: Make the carrier_on_task race aware

We blindly assume that we can just take the rtnl lock and that will
prevent races with downing this interface.  Unfortunately, that's not
the case.  In ipoib_mcast_stop_thread() we will call flush_workqueue()
in an attempt to clear out all remaining instances of ipoib_join_task.
But, since this task is put on the same workqueue as the join task,
the flush_workqueue waits on this thread too.  But this thread is
deadlocked on the rtnl lock.  The better thing here is to use trylock
and loop on that until we either get the lock or we see that
FLAG_ADMIN_UP has been cleared, in which case we don't need to do
anything anyway and we just return.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIPoIB: Consolidate rtnl_lock tasks in workqueue
Doug Ledford [Wed, 10 Dec 2014 16:46:58 +0000 (11:46 -0500)]
IPoIB: Consolidate rtnl_lock tasks in workqueue

Setting the MTU can safely be moved to the carrier_on_task, which keeps
us from needing to take the rtnl lock in the join_finish section.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cxgb4: Handle NET_XMIT return codes
Hariprasad S [Mon, 8 Dec 2014 09:32:47 +0000 (15:02 +0530)]
RDMA/cxgb4: Handle NET_XMIT return codes

cxgb4_create_server() and cxgb4_create_server6() return NET_XMIT_*
values or a negative errno. iw_cxgb4 need to handle this correctly.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cxgb4: Wake up waiters after flushing the qp
Steve Wise [Fri, 21 Nov 2014 15:36:36 +0000 (09:36 -0600)]
RDMA/cxgb4: Wake up waiters after flushing the qp

When transitioning into ERROR state, the QP was getting flushed after
waking up any waiters.  This can cause applications to miss flushed work
requests which can stall an NFS mount.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cxgb4: Limit MRs to < 8GB for T4/T5 devices
Hariprasad Shenai [Fri, 21 Nov 2014 15:36:36 +0000 (09:36 -0600)]
RDMA/cxgb4: Limit MRs to < 8GB for T4/T5 devices

T4/T5 hardware can't handle MRs >= 8GB due to a hardware bug.  So limit
registrations to < 8GB for thse devices.

Based on original work by Steve Wise <swise@opengridcomputing.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cxgb4: Fix locking issue in process_mpa_request
Hariprasad Shenai [Fri, 21 Nov 2014 15:36:35 +0000 (09:36 -0600)]
RDMA/cxgb4: Fix locking issue in process_mpa_request

Fix the following lockdep report:

    =============================================
    [ INFO: possible recursive locking detected ]
    3.17.0+ #3 Tainted: G            E
    ---------------------------------------------
    kworker/u64:3/299 is trying to acquire lock:
     (&epc->mutex){+.+.+.}, at: [<ffffffffa074e07a>]
    process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]

    but task is already holding lock:
     (&epc->mutex){+.+.+.}, at: [<ffffffffa074e34e>] rx_data+0x9e/0x1f0 [iw_cxgb4]

    other info that might help us debug this:
     Possible unsafe locking scenario:

           CPU0
           ----
      lock(&epc->mutex);
      lock(&epc->mutex);

     *** DEADLOCK ***

     May be due to missing lock nesting notation

    3 locks held by kworker/u64:3/299:
     #0:  ("%s""iw_cxgb4"){.+.+.+}, at: [<ffffffff8106f14d>]
    process_one_work+0x13d/0x4d0
     #1:  (skb_work){+.+.+.}, at: [<ffffffff8106f14d>] process_one_work+0x13d/0x4d0
     #2:  (&epc->mutex){+.+.+.}, at: [<ffffffffa074e34e>] rx_data+0x9e/0x1f0
    [iw_cxgb4]

    stack backtrace:
    CPU: 2 PID: 299 Comm: kworker/u64:3 Tainted: G            E  3.17.0+ #3
    Hardware name: Dell Inc. PowerEdge T110/0X744K, BIOS 1.2.1 01/28/2010
    Workqueue: iw_cxgb4 process_work [iw_cxgb4]
     ffff8800b91593d0 ffff8800b8a2f9f8 ffffffff815df107 0000000000000001
     ffff8800b9158750 ffff8800b8a2fa28 ffffffff8109f0e2 ffff8800bb768a00
     ffff8800b91593d0 ffff8800b9158750 0000000000000000 ffff8800b8a2fa88
    Call Trace:
     [<ffffffff815df107>] dump_stack+0x49/0x62
     [<ffffffff8109f0e2>] print_deadlock_bug+0xf2/0x100
     [<ffffffff810a0f04>] validate_chain+0x454/0x700
     [<ffffffff810a1574>] __lock_acquire+0x3c4/0x580
     [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffff810a17cc>] lock_acquire+0x9c/0x110
     [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffff815e111b>] mutex_lock_nested+0x4b/0x360
     [<ffffffffa074e07a>] ? process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffff810c181a>] ? del_timer_sync+0xaa/0xd0
     [<ffffffff810c1770>] ? try_to_del_timer_sync+0x70/0x70
     [<ffffffffa074e07a>] process_mpa_request+0x1aa/0x3e0 [iw_cxgb4]
     [<ffffffffa074a3ec>] ? update_rx_credits+0xec/0x140 [iw_cxgb4]
     [<ffffffffa074e381>] rx_data+0xd1/0x1f0 [iw_cxgb4]
     [<ffffffff8109ff23>] ? mark_held_locks+0x73/0xa0
     [<ffffffff815e4b90>] ? _raw_spin_unlock_irqrestore+0x40/0x70
     [<ffffffff810a020d>] ? trace_hardirqs_on_caller+0xfd/0x1c0
     [<ffffffff810a02dd>] ? trace_hardirqs_on+0xd/0x10
     [<ffffffffa074c931>] process_work+0x51/0x80 [iw_cxgb4]
     [<ffffffff8106f1c8>] process_one_work+0x1b8/0x4d0
     [<ffffffff8106f14d>] ? process_one_work+0x13d/0x4d0
     [<ffffffff8106f600>] worker_thread+0x120/0x3c0
     [<ffffffff8106f4e0>] ? process_one_work+0x4d0/0x4d0
     [<ffffffff81074a0e>] kthread+0xde/0x100
     [<ffffffff815e4b40>] ? _raw_spin_unlock_irq+0x30/0x40
     [<ffffffff81074930>] ? __init_kthread_worker+0x70/0x70
     [<ffffffff815e512c>] ret_from_fork+0x7c/0xb0
     [<ffffffff81074930>] ? __init_kthread_worker+0x70/0x70

Based on original work by Steve Wise <swise@opengridcomputing.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cxgb4: Configure 0B MRs to match HW implementation
Pramod Kumar [Fri, 21 Nov 2014 15:36:35 +0000 (09:36 -0600)]
RDMA/cxgb4: Configure 0B MRs to match HW implementation

0B MRs need some tweaks to work correctly with HW. When writing the
TPTE, if the MR length is zero we now:

1) turn off all permissions
2) set the length to -1

While functionality/capabilities of the MR are the same with these
changes, it resolves a dapltest 0B RDMA Read test failure.  Based on
original work by Steve Wise <swise@opengridcomputing.com>.

Signed-off-by: Pramod Kumar <pramod@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoRDMA/cxgb4: Increase epd buff size for debug interface
Pramod Kumar [Fri, 21 Nov 2014 15:36:35 +0000 (09:36 -0600)]
RDMA/cxgb4: Increase epd buff size for debug interface

IPv6 address string lengths require increasing the buffer size for
debugfs handlers.

Signed-off-by: Pramod Kumar <pramod@chelsio.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/addr: Improve address resolution callback scheduling
Or Kehati [Wed, 29 Oct 2014 14:32:04 +0000 (16:32 +0200)]
IB/addr: Improve address resolution callback scheduling

Address resolution always does a context switch to a work-queue to
deliver the address resolution event.  When the IP address is already
cached in the system ARP table, we're going through the following:
chain:

    rdma_resolve_ip --> addr_resolve (cache hit) -->

which ends up with:

    queue_req --> set_timeout (now) --> mod_delayed_work(,, delay=1)

We actually do realize that the timeout should be zero, but the code
forces it to a minimum of one jiffie.

Using one jiffie as the minimum delay value results in sub-optimal
scheduling of executing this work item by the workqueue, which on the
below testbed costs about 3-4ms out of 12ms total time.

To fix that, we let the minimum delay to be zero.  Note that the
connect step times change too, as there are address resolution calls
from that flow.

The results were taken from running both client and server on the
same node, over mlx4 RoCE port.

before -->
step              total ms     max ms     min us  us / conn
create id    :        0.01       0.01       6.00       6.00
resolve addr :        4.02       4.01    4013.00    4016.00
resolve route:        0.18       0.18     182.00     183.00
create qp    :        1.15       1.15    1150.00    1150.00
connect      :        6.73       6.73    6730.00    6731.00
disconnect   :        0.55       0.55     549.00     550.00
destroy      :        0.01       0.01       9.00       9.00

after -->
step              total ms     max ms     min us  us / conn
create id    :        0.01       0.01       6.00       6.00
resolve addr :        0.05       0.05      49.00      52.00
resolve route:        0.21       0.21     207.00     208.00
create qp    :        1.10       1.10    1104.00    1104.00
connect      :        1.22       1.22    1220.00    1221.00
disconnect   :        0.71       0.71     713.00     713.00
destroy      :        0.01       0.01       9.00       9.00

Signed-off-by: Or Kehati <ork@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/core: Fix mgid key handling in SA agent multicast data-base
Jack Morgenstein [Wed, 19 Nov 2014 09:08:58 +0000 (11:08 +0200)]
IB/core: Fix mgid key handling in SA agent multicast data-base

Applications can request that the SM assign an MGID by passing a mcast
member request containing MGID = 0. When the SM responds by sending
the allocated MGID, this MGID replaces the 0-MGID in the multicast group.

However, the MGID field in the group is also the key field in the IB
core multicast code rbtree containing the multicast groups for the
port.

Since this is a key field, correct handling requires that the group
entry be deleted from the rbtree and then re-inserted with the new
key, so that the table structure is properly maintained.

The current code does not do this correctly.  Correct operation
requires that if the key-field gid has changed at all, it should be
deleted and re-inserted.

Note that when inserting, if the new MGID is zero (not the case here
but the code should handle this correctly), we allow duplicate entries
for 0-MGIDs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoIB/core: Do not resolve VLAN if already resolved
Moni Shoua [Wed, 19 Nov 2014 09:28:18 +0000 (11:28 +0200)]
IB/core: Do not resolve VLAN if already resolved

For RoCE, resolution of layer 2 address attributes forces no VLAN if
link-local GIDs are used.  This patch allows applications to choose
the VLAN ID for link-local based RoCE GIDs by setting IB_QP_VID in
their QP attribute mask, and prevents the core from overriding this
choice.

Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
10 years agoMerge tag 'staging-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Tue, 16 Dec 2014 02:06:13 +0000 (18:06 -0800)]
Merge tag 'staging-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver updates from Greg KH:
 "Here's the big staging tree pull request for 3.19-rc1.

  We continued to delete more lines than were added, always a good
  thing, but not at a huge rate this release, only about 70k lines
  removed overall mostly from removing the horrid bcm driver.

  Lots of normal staging driver cleanups and fixes all over the place,
  well over a thousand of them, the shortlog shows all the horrid
  details.

  The "contentious" thing here is the movement of the Android binder
  code out of staging into the "real" part of the kernel.  This is code
  that has been stable for a few years now and is working as-is in the
  tens of millions of devices with no issues.  Yes, the code is horrid,
  and the userspace api leaves a lot to be desired, but it's not going
  to change due to legacy issues that we have no control over.  Because
  so many devices and companies rely on this, and the code is stable,
  might as well promote it out of staging.

  This was all discussed at the Linux Plumbers conference, and everyone
  participating agreed that this was the best way forward.

  There is work happening to replace the binder code with something new
  that is happening right now, but I don't expect to see the results of
  that work for another year at the earliest.  If that ever happens, and
  Android switches over to it, I'll gladly remove this version.

  As for maintainers, I'll be glad to maintain this code, I've been
  doing it for the past few years with no problems.  I'll send a
  MAINTAINERS entry for it before 3.19-final is out, still need to talk
  to the Google developers about if they are willing to help with it or
  not, last I checked they were, which was good.

  All of these patches have been in linux-next for a while with no
  reported issues"

* tag 'staging-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1382 commits)
  Staging: slicoss: Fix long line issues in slicoss.c
  staging: rtl8712: remove unnecessary else after return
  staging: comedi: change some printk calls to pr_err
  staging: rtl8723au: hal: Removed the extra semicolon
  lustre: Deletion of unnecessary checks before three function calls
  staging: lustre: fix sparse warnings: static function declaration
  staging: lustre: fixed sparse warnings related to static declarations
  staging: unisys: remove duplicate header
  staging: unisys: remove unneeded structure
  staging: ft1000 : replace __attribute ((__packed__) with __packed
  drivers: staging: rtl8192e: Include "asm/unaligned.h" instead of "access_ok.h" in "rtl819x_BAProc.c"
  Drivers:staging:rtl8192e: Fixed checkpatch warning
  Drivers:staging:clocking-wizard: Added a newline
  staging: clocking-wizard: check for a valid clk_name pointer
  staging: rtl8723au: Hal_InitPGData() avoid unnecessary typecasts
  staging: rtl8723au: _DisableAnalog(): Avoid zero-init variables unnecessarily
  staging: rtl8723au: Remove unnecessary wrapper _ResetDigitalProcedure1()
  staging: rtl8723au: _ResetDigitalProcedure1_92C() reduce code obfuscation
  staging: rtl8723au: Remove unnecessary wrapper _DisableRFAFEAndResetBB()
  staging: rtl8723au: _DisableRFAFEAndResetBB8192C(): Reduce code obfuscation
  ...

10 years agoMerge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux...
James Morris [Tue, 16 Dec 2014 01:49:10 +0000 (12:49 +1100)]
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity into for-linus

10 years agoMerge tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee139...
Linus Torvalds [Tue, 16 Dec 2014 01:40:28 +0000 (17:40 -0800)]
Merge tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

Pull firewire updates from Stefan Richter:
 "IEEE 1394 subsystem updates:
   - clean up firewire-ohci's longlived vm-mapping
   - use target instance lock instead of core lock in firewire-sbp2"

* tag 'firewire-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
  firewire: sbp2: replace card lock by target lock
  firewire: sbp2: replace some spin_lock_irqsave by spin_lock_irq
  firewire: sbp2: protect a reference counter properly
  firewire: core: document fw_csr_string's truncation of long strings
  firewire: ohci: replace vm_map_ram() with vmap()

10 years agoMerge tag 'for-v3.19' of git://git.infradead.org/battery-2.6
Linus Torvalds [Tue, 16 Dec 2014 01:36:45 +0000 (17:36 -0800)]
Merge tag 'for-v3.19' of git://git.infradead.org/battery-2.6

Pull power supply updates from Sebastian Reichel::
 "Power supply and reset changes for the v3.19 series

   - update power/reset drivers to use kernel restart handler
   - add power off driver for i.mx6
   - add DT support for gpio-charger"

* tag 'for-v3.19' of git://git.infradead.org/battery-2.6:
  power: reset: adjust priority of simple syscon reboot driver
  power: ds2782_battery: Simplify the PM hooks
  power/reset: brcmstb: Register with kernel restart handler
  power/reset: hisi: Register with kernel restart handler
  power/reset: keystone: Register with kernel restart handler
  power/reset: axxia: Register with kernel restart handler
  power/reset: xgene: Register with kernel restart handler
  power/reset: xgene: Use mdelay instead of jiffies based timeout
  power/reset: xgene: Use local variable dev instead of pdev->dev
  power/reset: xgene: Drop devm_kfree
  power/reset: xgene: Return -ENOMEM if out of memory
  power/reset: vexpress: Register with kernel restart handler
  power: reset: imx-snvs-poweroff: add power off driver for i.mx6
  power: gpio-charger: add device tree support
  dt-bindings: document gpio-charger bindings

10 years agoMerge tag 'hsi-for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi
Linus Torvalds [Tue, 16 Dec 2014 01:33:47 +0000 (17:33 -0800)]
Merge tag 'hsi-for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi

Pull HSI update from Sebastian Reichel:
 "Misc fixes in omap-ssi and nokia-modem drivers"

* tag 'hsi-for-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi:
  HSI: nokia-modem: fix error handling of irq_of_parse_and_map
  HSI: nokia-modem: setup default value for pm parameter
  HSI: omap_ssi_port: Don't print uninitialized err
  HSI: remove deprecated IRQF_DISABLED

10 years agoMerge branch 'irq-irqdomain-arm-for-linus' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Tue, 16 Dec 2014 01:30:09 +0000 (17:30 -0800)]
Merge branch 'irq-irqdomain-arm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq domain ARM updates from Thomas Gleixner:
 "This set of changes make use of hierarchical irqdomains to provide:

   - MSI/ITS support for GICv3
   - MSI support for GICv2m
   - Interrupt polarity extender for GICv1

  Marc has come more cleanups for the existing extension hooks of GIC in
  the pipeline, but they are going to be 3.20 material"

* 'irq-irqdomain-arm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  irqchip: gicv3-its: Fix ITT allocation
  irqchip: gicv3-its: Move some alloc/free code to activate/deactivate
  irqchip: gicv3-its: Fix domain free in multi-MSI case
  irqchip: gic: Remove warning by including linux/irqdomain.h
  irqchip: gic-v2m: Add DT bindings for GICv2m
  irqchip: gic-v2m: Add support for ARM GICv2m MSI(-X) doorbell
  irqchip: mtk-sysirq: dt-bindings: Add bindings for mediatek sysirq
  irqchip: mtk-sysirq: Add sysirq interrupt polarity support
  irqchip: gic: Support hierarchy irq domain.
  irqchip: GICv3: Binding updates for ITS
  irqchip: GICv3: ITS: enable compilation of the ITS driver
  irqchip: GICv3: ITS: plug ITS init into main GICv3 code
  irqchip: GICv3: ITS: DT probing and initialization
  irqchip: GICv3: ITS: MSI support
  irqchip: GICv3: ITS: device allocation and configuration
  irqchip: GICv3: ITS: tables allocators
  irqchip: GICv3: ITS: LPI allocator
  irqchip: GICv3: ITS: irqchip implementation
  irqchip: GICv3: ITS command queue
  irqchip: GICv3: rework redistributor structure
  ...

10 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Tue, 16 Dec 2014 01:25:20 +0000 (17:25 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull core block fix from Jens Axboe:
 "Jan reported a problem this morning with a crash in blk-mq, and after
  looking over the recent changes, it's obvious that the blk-mq-tag
  waitqueue handling change is buggy.  We could end up _not_ doing
  finish_wait() before switching to a new waitqueue, thus corrupting the
  wait task list"

* 'for-linus' of git://git.kernel.dk/linux-block:
  Revert "blk-mq: Micro-optimize bt_get()"

10 years agoMerge tag 'rpmsg-3.19-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad...
Linus Torvalds [Tue, 16 Dec 2014 01:07:58 +0000 (17:07 -0800)]
Merge tag 'rpmsg-3.19-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg

Pull rpmsg update from Ohad Ben-Cohen:
 "A single patch from Suman Anna which makes rpmsg use less buffers when
  small vrings are being used"

* tag 'rpmsg-3.19-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
  rpmsg: use less buffers when vrings are small

10 years agoMerge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Mon, 15 Dec 2014 23:52:01 +0000 (15:52 -0800)]
Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux

Pull drm updates from Dave Airlie:
 "Highlights:

   - AMD KFD driver merge

     This is the AMD HSA interface for exposing a lowlevel interface for
     GPGPU use.  They have an open source userspace built on top of this
     interface, and the code looks as good as it was going to get out of
     tree.

   - Initial atomic modesetting work

     The need for an atomic modesetting interface to allow userspace to
     try and send a complete set of modesetting state to the driver has
     arisen, and been suffering from neglect this past year.  No more,
     the start of the common code and changes for msm driver to use it
     are in this tree.  Ongoing work to get the userspace ioctl finished
     and the code clean will probably wait until next kernel.

   - DisplayID 1.3 and tiled monitor exposed to userspace.

     Tiled monitor property is now exposed for userspace to make use of.

   - Rockchip drm driver merged.

   - imx gpu driver moved out of staging

  Other stuff:

   - core:
        panel - MIPI DSI + new panels.
        expose suggested x/y properties for virtual GPUs

   - i915:
        Initial Skylake (SKL) support
        gen3/4 reset work
        start of dri1/ums removal
        infoframe tracking
        fixes for lots of things.

   - nouveau:
        tegra k1 voltage support
        GM204 modesetting support
        GT21x memory reclocking work

   - radeon:
        CI dpm fixes
        GPUVM improvements
        Initial DPM fan control

   - rcar-du:
        HDMI support added
        removed some support for old boards
        slave encoder driver for Analog Devices adv7511

   - exynos:
        Exynos4415 SoC support

   - msm:
        a4xx gpu support
        atomic helper conversion

   - tegra:
        iommu support
        universal plane support
        ganged-mode DSI support

   - sti:
        HDMI i2c improvements

   - vmwgfx:
        some late fixes.

   - qxl:
        use suggested x/y properties"

* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (969 commits)
  drm: sti: fix module compilation issue
  drm/i915: save/restore GMBUS freq across suspend/resume on gen4
  drm: sti: correctly cleanup CRTC and planes
  drm: sti: add HQVDP plane
  drm: sti: add cursor plane
  drm: sti: enable auxiliary CRTC
  drm: sti: fix delay in VTG programming
  drm: sti: prepare sti_tvout to support auxiliary crtc
  drm: sti: use drm_crtc_vblank_{on/off} instead of drm_vblank_{on/off}
  drm: sti: fix hdmi avi infoframe
  drm: sti: remove event lock while disabling vblank
  drm: sti: simplify gdp code
  drm: sti: clear all mixer control
  drm: sti: remove gpio for HDMI hot plug detection
  drm: sti: allow to change hdmi ddc i2c adapter
  drm/doc: Document drm_add_modes_noedid() usage
  drm/i915: Remove '& 0xffff' from the mask given to WA_REG()
  drm/i915: Invert the mask and val arguments in wa_add() and WA_REG()
  drm: Zero out DRM object memory upon cleanup
  drm/i915/bdw: Fix the write setting up the WIZ hashing mode
  ...

10 years agox86: mm: consolidate VM_FAULT_RETRY handling
Linus Torvalds [Mon, 15 Dec 2014 23:07:33 +0000 (15:07 -0800)]
x86: mm: consolidate VM_FAULT_RETRY handling

The VM_FAULT_RETRY handling was confusing and incorrect for the case of
returning to kernel mode.  We need to handle the exception table fixup
if we return to kernel mode due to a fatal signal - it will basically
look to the kernel user mode access like the access failed due to the VM
going away from udner it.  Which is correct - the process is dying - and
avoids the whole "repeat endless kernel page faults" case.

Handling the VM_FAULT_RETRY early and in just one place also simplifies
the mmap_sem handling, since once we've taken care of VM_FAULT_RETRY we
know that we can just drop the lock.  The remaining accounting and
possible error handling is thread-local and does not need the mmap_sem.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agox86: mm: move mmap_sem unlock from mm_fault_error() to caller
Linus Torvalds [Mon, 15 Dec 2014 22:46:06 +0000 (14:46 -0800)]
x86: mm: move mmap_sem unlock from mm_fault_error() to caller

This replaces four copies in various stages of mm_fault_error() handling
with just a single one.  It will also allow for more natural placement
of the unlocking after some further cleanup.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoMerge tag 'omap-for-v3.19/fixes-for-merge-window' of git://git.kernel.org/pub/scm...
Kevin Hilman [Mon, 15 Dec 2014 21:59:43 +0000 (13:59 -0800)]
Merge tag 'omap-for-v3.19/fixes-for-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes

From: Tony Lindgren <tony@atomide.com>
Subject: [GIT PULL] few fixes for the v3.19 merge window

Fixes for a few issues found that would be good to get
into -rc1:

- Update SoC revision detection for am43x es1.2

- Fix regression with GPMC timings on 2430sdp for some versions
  of u-boot

- Fix dra7 watchdog compatible property

- Fix am437x-sk-evm LCD timings

- Fix dra7 DSS clock muxing

- Fix dra7-evm voltages

- Remove a unused function prototype for am33xx_clk_init

- Enable AHCI in the omap2plus_defconfig

* tag 'omap-for-v3.19/fixes-for-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: (1601 commits)
  ARM: omap2plus_defconfig: Enable AHCI_PLATFORM driver
  ARM: dts: am437x-sk-evm.dts: fix LCD timings
  ARM: dts: dra7-evm: Update SMPS7 (VDD_CORE) max voltage to match DM
  ARM: dts: dra7-evm: Fix typo in SMPS6 (VDD_GPU) max voltage
  ARM: OMAP2+: AM43x: Add ID for ES1.2
  ARM: dts: am437x-sk: fix lcd enable pin mux data
  ARM: dts: Fix gpmc regression for omap 2430sdp smc91x
  hwmon: (tmp401) Detect TMP435 on all addresses it supports
  mfd: rtsx: Add func to split u32 into register
  mmc: sdhci-msm: Convert to mmc_send_tuning()
  mmc: sdhci-esdhc-imx: Convert to mmc_send_tuning()
  mmc: core: Let mmc_send_tuning() to take struct mmc_host* as parameter
  nios2: Make NIOS2_CMDLINE_IGNORE_DTB depend on CMDLINE_BOOL
  nios2: Add missing NR_CPUS to Kconfig
  nios2: asm-offsets: Remove unused definition TI_TASK
  nios2: Remove write-only struct member from nios2_timer
  nios2: Remove unused extern declaration of shm_align_mask
  nios2: include linux/type.h in io.h
  nios2: move include asm-generic/io.h to end of file
  nios2: remove include asm-generic/iomap.h from io.h
  ...

10 years agomic/host: fix up virtio 1.0 APIs
Michael S. Tsirkin [Thu, 11 Dec 2014 23:55:03 +0000 (01:55 +0200)]
mic/host: fix up virtio 1.0 APIs

This just makes code sparse-clean by using
new memory access APIs in one file I missed.
The new feature bit is not yet negotiated.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
10 years agovringh: update for virtio 1.0 APIs
Michael S. Tsirkin [Thu, 11 Dec 2014 23:10:49 +0000 (01:10 +0200)]
vringh: update for virtio 1.0 APIs

When switching everything over to virtio 1.0 memory access APIs,
I missed converting vringh.
Fortunately, it's straight-forward.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
10 years agovringh: 64 bit features
Michael S. Tsirkin [Thu, 11 Dec 2014 22:36:06 +0000 (00:36 +0200)]
vringh: 64 bit features

Pass u64 everywhere.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
10 years agotools/virtio: add virtio 1.0 in vringh_test
Michael S. Tsirkin [Sun, 14 Dec 2014 21:36:33 +0000 (23:36 +0200)]
tools/virtio: add virtio 1.0 in vringh_test

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
10 years agotools/virtio: add virtio 1.0 in virtio_test
Michael S. Tsirkin [Sun, 14 Dec 2014 20:49:16 +0000 (22:49 +0200)]
tools/virtio: add virtio 1.0 in virtio_test

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
10 years agotools/virtio: enable -Werror
Michael S. Tsirkin [Sun, 14 Dec 2014 21:26:54 +0000 (23:26 +0200)]
tools/virtio: enable -Werror

Seems to mostly be a positive.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
10 years agotools/virtio: 64 bit features
Michael S. Tsirkin [Sun, 14 Dec 2014 20:52:25 +0000 (22:52 +0200)]
tools/virtio: 64 bit features

Missed one place where vringh_test used
long to pass features. Fix it up to u64.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
10 years agotools/virtio: fix vringh test
Michael S. Tsirkin [Sun, 14 Dec 2014 19:46:04 +0000 (21:46 +0200)]
tools/virtio: fix vringh test

Include missing virtio_config.h

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
10 years agotools/virtio: more stubs
Michael S. Tsirkin [Mon, 15 Dec 2014 21:47:07 +0000 (23:47 +0200)]
tools/virtio: more stubs

As usual, add more stubs to fix test build after main
codebase changes.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
10 years agoARM: defconfigs: use CONFIG_CPUFREQ_DT
Viresh Kumar [Mon, 15 Dec 2014 04:18:19 +0000 (09:48 +0530)]
ARM: defconfigs: use CONFIG_CPUFREQ_DT

CONFIG_GENERIC_CPUFREQ_CPU0 disappeared with commit bbcf071969b20f
("cpufreq: cpu0: rename driver and internals to 'cpufreq_dt'") and some
defconfigs are still using it instead of the new one.

Use the renamed CONFIG_CPUFREQ_DT generic driver.

Cc: <stable@vger.kernel.org> # 3.18
Reported-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Kevin Hilman <khilman@linaro.org>
10 years agortlwifi: rtl8192ce: Set fw_ready flag
Larry Finger [Wed, 10 Dec 2014 20:38:29 +0000 (14:38 -0600)]
rtlwifi: rtl8192ce: Set fw_ready flag

The setting of this flag was missed in previous modifications.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
10 years agobrcmsmac: don't leak kernel memory via printk()
Brian Norris [Wed, 10 Dec 2014 09:39:18 +0000 (01:39 -0800)]
brcmsmac: don't leak kernel memory via printk()

Debug code prints the fifo name via custom dev_warn() wrappers. The
fifo_names array is only non-zero when debugging is manually enabled,
which is all well and good. However, it's *not* good that this array
uses zero-length arrays in the non-debug case, and so it doesn't
actually have any memory allocated to it. This means that as far as we
know, fifo_names[i] actually points to garbage memory.

I've seen this in my log:

[ 4601.205511] brcmsmac bcma0:1: wl0: brcms_c_d11hdrs_mac80211: �GeL txop exceeded phylen 137/256 dur 1602/1504

So let's give this array space enough to fill it with a NULL byte.

Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Cc: Brett Rudley <brudley@broadcom.com>
Cc: Arend van Spriel <arend@broadcom.com>
Cc: "Franky (Zhenhui) Lin" <frankyl@broadcom.com>
Cc: Hante Meuleman <meuleman@broadcom.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: linux-wireless@vger.kernel.org
Cc: brcm80211-dev-list@broadcom.com
Cc: netdev@vger.kernel.org
Acked-by: Arend van Spriel <arend@broadcom.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
10 years agortlwifi: rtl8192cu: Fix sparse non static symbol warning
Wei Yongjun [Tue, 9 Dec 2014 13:18:43 +0000 (21:18 +0800)]
rtlwifi: rtl8192cu: Fix sparse non static symbol warning

Fixes the following sparse warning:

drivers/net/wireless/rtlwifi/rtl8192cu/hw.c:1595:6: warning:
 symbol 'usb_cmd_send_packet' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
10 years agortlwifi: rtl8821ae: fix misspelling of current function in string
Julia Lawall [Sun, 7 Dec 2014 19:20:58 +0000 (20:20 +0100)]
rtlwifi: rtl8821ae: fix misspelling of current function in string

Replace a misspelled function name by %s and then __func__.

8821 was written as 8812.

This was done using Coccinelle, including the use of Levenshtein distance,
as proposed by Rasmus Villemoes.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
10 years agohostap_cs: fix misspelling of current function in string
Julia Lawall [Sun, 7 Dec 2014 19:20:57 +0000 (20:20 +0100)]
hostap_cs: fix misspelling of current function in string

Replace a misspelled function name by %s and then __func__.

This was done using Coccinelle, including the use of Levenshtein distance,
as proposed by Rasmus Villemoes.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
10 years agozd1211rw: fix misspelling of current function in string
Julia Lawall [Sun, 7 Dec 2014 19:20:55 +0000 (20:20 +0100)]
zd1211rw: fix misspelling of current function in string

Replace a misspelled function name by %s and then __func__.

This was done using Coccinelle, including the use of Levenshtein distance,
as proposed by Rasmus Villemoes.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
10 years agoMerge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetoot...
John W. Linville [Mon, 15 Dec 2014 18:23:09 +0000 (13:23 -0500)]
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

10 years agoplatform/x86/acerhdf: Still depends on THERMAL
Randy Dunlap [Mon, 15 Dec 2014 17:00:13 +0000 (09:00 -0800)]
platform/x86/acerhdf: Still depends on THERMAL

acerhdf uses thermal interfaces so it should depend on THERMAL.
It also should not select a thermal driver without checking that
THERMAL is enabled.

This fixes the following build errors when THERMAL=m and
ACERHDF=y.

drivers/built-in.o: In function `acerhdf_set_mode':
acerhdf.c:(.text+0x3e02e1): undefined reference to `thermal_zone_device_update'
drivers/built-in.o: In function `acerhdf_unbind':
acerhdf.c:(.text+0x3e052d): undefined reference to `thermal_zone_unbind_cooling_device'
drivers/built-in.o: In function `acerhdf_bind':
acerhdf.c:(.text+0x3e0593): undefined reference to `thermal_zone_bind_cooling_device'
drivers/built-in.o: In function `acerhdf_init':
acerhdf.c:(.init.text+0x1c2f5): undefined reference to `thermal_cooling_device_register'
acerhdf.c:(.init.text+0x1c360): undefined reference to `thermal_zone_device_register'
drivers/built-in.o: In function `acerhdf_unregister_thermal':
acerhdf.c:(.text.unlikely+0x3c67): undefined reference to `thermal_cooling_device_unregister'
acerhdf.c:(.text.unlikely+0x3c91): undefined reference to `thermal_zone_device_unregister'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Peter Feuerer <peter@piie.net>
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
10 years agoMerge branch 'macb'
David S. Miller [Mon, 15 Dec 2014 16:50:50 +0000 (11:50 -0500)]
Merge branch 'macb'

Cyrille Pitchen says:

====================
net/macb: fix multiqueue support patch up

This series of patches is a fixup for the multiqueue support patch.

The first patch fixes a bug introduced by the multiqueue support patch.
The second one doesn't fix a bug but simplify the source code by removing
useless calls to devm_free_irq() since we use managed device resources for
IRQs.

They were applied on the net-next tree and tested with a sama5d36ek board.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/macb: remove useless calls of devm_free_irq()
Cyrille Pitchen [Mon, 15 Dec 2014 14:13:32 +0000 (15:13 +0100)]
net/macb: remove useless calls of devm_free_irq()

Inside macb_probe(), when devm_request_irq() fails on queue q, there is no need
to call devm_free_irq() on queues 0..q-1 because the managed device resources
are released later when calling free_netdev().

Also removing devm_free_irq() call from macb_remove() for the same reason.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/macb: fix misplaced call of free_netdev() in macb_remove()
Cyrille Pitchen [Mon, 15 Dec 2014 14:13:31 +0000 (15:13 +0100)]
net/macb: fix misplaced call of free_netdev() in macb_remove()

fix a bug introduced by the multiqueue support patch:
"net/macb: add TX multiqueue support for gem"

the "bp" pointer to the netdev private data was dereferenced and used after the
associated memory had been freed by calling free_netdev().

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agords: Fix min() warning in rds_message_inc_copy_to_user()
Geert Uytterhoeven [Mon, 15 Dec 2014 12:21:42 +0000 (13:21 +0100)]
rds: Fix min() warning in rds_message_inc_copy_to_user()

net/rds/message.c: In function ‘rds_message_inc_copy_to_user’:
net/rds/message.c:328: warning: comparison of distinct pointer types lacks a cast

Use min_t(unsigned long, ...) like is done in
rds_message_copy_from_user().

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: stmmac: sti: Fix uninitialized pointer dereference if !OF
Geert Uytterhoeven [Mon, 15 Dec 2014 11:25:51 +0000 (12:25 +0100)]
net: stmmac: sti: Fix uninitialized pointer dereference if !OF

If CONFIG_OF is not set:

drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c: In function ‘sti_dwmac_parse_data’:
drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c:318: warning: ‘rs’ is used uninitialized in this function

of_property_read_string() will return -ENOSYS in this case, and rs will
be an uninitialized pointer.

While the fallback clock selection is already selected correctly in this
case, the string comparisons should be skipped too, else the system will
crash while dereferencing the uninitialized pointer.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet: smc91x: Fix build without gpiolib
Tobias Klauser [Mon, 15 Dec 2014 09:02:27 +0000 (10:02 +0100)]
net: smc91x: Fix build without gpiolib

If GPIOLIB=n the following build errors occur:

drivers/net/ethernet/smsc/smc91x.c: In function 'try_toggle_control_gpio':
drivers/net/ethernet/smsc/smc91x.c:2204:2: error: implicit declaration of function 'devm_gpiod_get_index' [-Werror=implicit-function-declaration]
drivers/net/ethernet/smsc/smc91x.c:2204:7: warning: assignment makes pointer from integer without a cast [enabled by default]
drivers/net/ethernet/smsc/smc91x.c:2213:2: error: implicit declaration of function 'gpiod_direction_output' [-Werror=implicit-function-declaration]
drivers/net/ethernet/smsc/smc91x.c:2216:3: error: implicit declaration of function 'devm_gpiod_put' [-Werror=implicit-function-declaration]
drivers/net/ethernet/smsc/smc91x.c:2222:2: error: implicit declaration of function 'gpiod_set_value_cansleep' [-Werror=implicit-function-declaration]

Fix this by letting the driver depend on GPIOLIB if OF is selected.

Fixes: 7d2911c4381 ("net: smc91x: Fix gpios for device tree based booting")
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agogre: fix the inner mac header in nbma tunnel xmit path
Timo Teräs [Mon, 15 Dec 2014 07:24:13 +0000 (09:24 +0200)]
gre: fix the inner mac header in nbma tunnel xmit path

The NBMA GRE tunnels temporarily push GRE header that contain the
per-packet NBMA destination on the skb via header ops early in xmit
path. It is the later pulled before the real GRE header is constructed.

The inner mac was thus set differently in nbma case: the GRE header
has been pushed by neighbor layer, and mac header points to beginning
of the temporary gre header (set by dev_queue_xmit).

Now that the offloads expect mac header to point to the gre payload,
fix the xmit patch to:
 - pull first the temporary gre header away
 - and reset mac header to point to gre payload

This fixes tso to work again with nbma tunnels.

Fixes: 14051f0452a2 ("gre: Use inner mac length when computing tunnel length")
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Cc: Tom Herbert <therbert@google.com>
Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agofib_trie.txt: fix typo
Duan Jiong [Mon, 15 Dec 2014 06:58:53 +0000 (14:58 +0800)]
fib_trie.txt: fix typo

Fix the typo, there should be "It".
On the other hand, fix whitespace errors detected by checkpatch.pl

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agocirrus: cs89x0: fix time comparison
Asaf Vertz [Sun, 14 Dec 2014 08:34:18 +0000 (10:34 +0200)]
cirrus: cs89x0: fix time comparison

To be future-proof and for better readability the time comparisons are
modified to use time_before, time_after, and time_after_eq instead of
plain, error-prone math.

Signed-off-by: Asaf Vertz <asaf.vertz@tandemg.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoMerge branch 'devel/for-linus-3.19' into stable/for-linus-3.19
David Vrabel [Mon, 15 Dec 2014 16:41:00 +0000 (16:41 +0000)]
Merge branch 'devel/for-linus-3.19' into stable/for-linus-3.19

10 years agoMerge branch 'mlx4'
David S. Miller [Mon, 15 Dec 2014 16:35:00 +0000 (11:35 -0500)]
Merge branch 'mlx4'

Or Gerlitz says:

====================
mlx4 driver fixes for 3.19-rc1

Just fixes for two small issues introduced in the 3.19 merge window
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_core: Avoid double dumping of the PF device capabilities
Or Gerlitz [Sun, 14 Dec 2014 14:18:05 +0000 (16:18 +0200)]
net/mlx4_core: Avoid double dumping of the PF device capabilities

To support asymmetric EQ allocations, we should query the device
capabilities prior to enabling SRIOV. As a side effect of adding that,
we are dumping the PF device capabilities twice. Avoid that by moving
the printing into a helper function which is called once.

Fixes: 7ae0e400cd93 ('net/mlx4_core: Flexible (asymmetric) allocation of
     EQs and MSI-X vectors for PF/VFs')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agonet/mlx4_core: Fixed memory leak and incorrect refcount in mlx4_load_one
Matan Barak [Sun, 14 Dec 2014 14:18:04 +0000 (16:18 +0200)]
net/mlx4_core: Fixed memory leak and incorrect refcount in mlx4_load_one

The current mlx4_load_one has a memory leak as it always allocates
dev_cap, but frees it only on error.

In addition, even if VFs exist when mlx4_load_one is called,
we still need to notify probed VFs that we're loading (by
incrementing pf_loading).

Fixes: a0eacca948d2 ('net/mlx4_core: Refactor mlx4_load_one')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
10 years agoRevert "blk-mq: Micro-optimize bt_get()"
Jens Axboe [Mon, 15 Dec 2014 15:30:26 +0000 (08:30 -0700)]
Revert "blk-mq: Micro-optimize bt_get()"

This reverts commit 52f7eb945f2ba62b324bb9ae16d945326a961dcf.

The optimization is only really safe for a single queue, otherwise
'bs' and 'bt' can indeed change, and if we don't do a finish_wait()
for each loop, we'll potentially change the wait structure and
corrupt task wait list.

Reported-by: Jan Kara <jack@suse.cz>
10 years agotracing: Add tp_printk cmdline to have tracepoints go to printk()
Steven Rostedt (Red Hat) [Sat, 13 Dec 2014 03:27:10 +0000 (22:27 -0500)]
tracing: Add tp_printk cmdline to have tracepoints go to printk()

Add the kernel command line tp_printk option that will have tracepoints
that are active sent to printk() as well as to the trace buffer.

Passing "tp_printk" will activate this. To turn it off, the sysctl
/proc/sys/kernel/tracepoint_printk can have '0' echoed into it. Note,
this only works if the cmdline option is used. Echoing 1 into the sysctl
file without the cmdline option will have no affect.

Note, this is a dangerous option. Having high frequency tracepoints send
their data to printk() can possibly cause a live lock. This is another
reason why this is only active if the command line option is used.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
10 years agotracing: Move enabling tracepoints to just after rcu_init()
Steven Rostedt (Red Hat) [Sat, 13 Dec 2014 01:05:10 +0000 (20:05 -0500)]
tracing: Move enabling tracepoints to just after rcu_init()

Enabling tracepoints at boot up can be very useful. The tracepoint
can be initialized right after RCU has been. There's no need to
wait for the early_initcall() to be called. That's too late for some
things that can use tracepoints for debugging. Move the logic to
enable tracepoints out of the initcalls and into init/main.c to
right after rcu_init().

This also allows trace_printk() to be used early too.

Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1412121539300.16494@nanos
Link: http://lkml.kernel.org/r/20141214164104.307127356@goodmis.org
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
10 years agoisofs: Fix infinite looping over CE entries
Jan Kara [Mon, 15 Dec 2014 13:22:46 +0000 (14:22 +0100)]
isofs: Fix infinite looping over CE entries

Rock Ridge extensions define so called Continuation Entries (CE) which
define where is further space with Rock Ridge data. Corrupted isofs
image can contain arbitrarily long chain of these, including a one
containing loop and thus causing kernel to end in an infinite loop when
traversing these entries.

Limit the traversal to 32 entries which should be more than enough space
to store all the Rock Ridge data.

Reported-by: P J P <ppandit@redhat.com>
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
10 years agoACPI / video: update the skip case for acpi_video_device_in_dod()
Aaron Lu [Mon, 15 Dec 2014 08:01:29 +0000 (16:01 +0800)]
ACPI / video: update the skip case for acpi_video_device_in_dod()

If the firmware has declared more than 8 video output devices, and the
one that control the internal panel's backlight is listed after the
first 8 output devices, the _DOD will not include it due to the current
i915 operation region implementation. As a result, we will not create a
backlight device for it while we should. Solve this problem by special
case the firmware that has 8+ output devices in that if we see such a
firmware, we do not test if the device is in _DOD list. The creation of
the backlight device will also enable the firmware to emit events on
backlight hotkey press when the acpi_osi= cmdline option is specified on
those affected ASUS laptops.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=70241
Reported-and-tested-by: Oleksij Rempel <linux@rempel-privat.de>
Reported-and-tested-by: Dmitry Tunin <hanipouspilot@gmail.com>
Reported-and-tested-by: Jimbo <jaime.91@hotmail.es>
Cc: 3.18+ <stable@vger.kernel.org> # 3.18+
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
10 years agopower / PM: Eliminate CONFIG_PM_RUNTIME
Rafael J. Wysocki [Sun, 14 Dec 2014 22:14:46 +0000 (23:14 +0100)]
power / PM: Eliminate CONFIG_PM_RUNTIME

After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME within #ifdef blocks depending on
CONFIG_PM may be dropped now.

Do that in drivers/power/pm2301_charger.c.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
10 years agoNFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
Rafael J. Wysocki [Sun, 14 Dec 2014 22:14:36 +0000 (23:14 +0100)]
NFC / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM

After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.

Replace CONFIG_PM_RUNTIME with CONFIG_PM in drivers/nfc/trf7970a.c.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
10 years agoSCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM
Rafael J. Wysocki [Sun, 14 Dec 2014 22:13:55 +0000 (23:13 +0100)]
SCSI / PM: Replace CONFIG_PM_RUNTIME with CONFIG_PM

After commit b2b49ccbdd54 (PM: Kconfig: Set PM_RUNTIME if PM_SLEEP is
selected) PM_RUNTIME is always set if PM is set, so #ifdef blocks
depending on CONFIG_PM_RUNTIME may now be changed to depend on
CONFIG_PM.

Replace CONFIG_PM_RUNTIME with CONFIG_PM everywhere under
drivers/scsi/ and in include/scsi/scsi_device.h.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Aaron Lu <aaron.lu@intel.com>
Acked-by: Christoph Hellwig <hch@lst.de>
10 years agoACPI / EC: Fix unexpected ec_remove_handlers() invocations
Lv Zheng [Mon, 15 Dec 2014 00:47:52 +0000 (08:47 +0800)]
ACPI / EC: Fix unexpected ec_remove_handlers() invocations

The ec_remove_handlers() is invoked without checking
EC_FLAGS_HANDLERS_INSTALLED, this patch enhances this check to avoid issues
that acpi_disable_gpe() is invoked unexpectedly to reduce the GPE runtime
count. This may happen when the EC handler installation failed on some
platforms.

Reported-by: Venkat Raghavulu <venkat.raghavulu@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
10 years agoKVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
Suresh E. Warrier [Mon, 3 Nov 2014 04:52:00 +0000 (15:52 +1100)]
KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked

The kvmppc_vcore_blocked() code does not check for the wait condition
after putting the process on the wait queue. This means that it is
possible for an external interrupt to become pending, but the vcpu to
remain asleep until the next decrementer interrupt.  The fix is to
make one last check for pending exceptions and ceded state before
calling schedule().

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
10 years agoKVM: PPC: Book3S HV: ptes are big endian
Cédric Le Goater [Thu, 20 Nov 2014 23:45:59 +0000 (00:45 +0100)]
KVM: PPC: Book3S HV: ptes are big endian

When being restored from qemu, the kvm_get_htab_header are in native
endian, but the ptes are big endian.

This patch fixes restore on a KVM LE host. Qemu also needs a fix for
this :

     http://lists.nongnu.org/archive/html/qemu-ppc/2014-11/msg00008.html

Signed-off-by: Cédric Le Goater <clg@fr.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
10 years agoKVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
Suresh E. Warrier [Mon, 3 Nov 2014 04:51:59 +0000 (15:51 +1100)]
KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI

This fixes some inaccuracies in the state machine for the virtualized
ICP when implementing the H_IPI hcall (Set_MFFR and related states):

1. The old code wipes out any pending interrupts when the new MFRR is
   more favored than the CPPR but less favored than a pending
   interrupt (by always modifying xisr and the pending_pri). This can
   cause us to lose a pending external interrupt.

   The correct code here is to only modify the pending_pri and xisr in
   the ICP if the MFRR is equal to or more favored than the current
   pending pri (since in this case, it is guaranteed that that there
   cannot be a pending external interrupt). The code changes are
   required in both kvmppc_rm_h_ipi and kvmppc_h_ipi.

2. Again, in both kvmppc_rm_h_ipi and kvmppc_h_ipi, there is a check
   for whether MFRR is being made less favored AND further if new MFFR
   is also less favored than the current CPPR, we check for any
   resends pending in the ICP. These checks look like they are
   designed to cover the case where if the MFRR is being made less
   favored, we opportunistically trigger a resend of any interrupts
   that had been previously rejected. Although, this is not a state
   described by PAPR, this is an action we actually need to do
   especially if the CPPR is already at 0xFF.  Because in this case,
   the resend bit will stay on until another ICP state change which
   may be a long time coming and the interrupt stays pending until
   then. The current code which checks for MFRR < CPPR is broken when
   CPPR is 0xFF since it will not get triggered in that case.

   Ideally, we would want to do a resend only if

    prio(pending_interrupt) < mfrr && prio(pending_interrupt) < cppr

   where pending interrupt is the one that was rejected. But we don't
   have the priority of the pending interrupt state saved, so we
   simply trigger a resend whenever the MFRR is made less favored.

3. In kvmppc_rm_h_ipi, where we save state to pass resends to the
   virtual mode, we also need to save the ICP whose need_resend we
   reset since this does not need to be my ICP (vcpu->arch.icp) as is
   incorrectly assumed by the current code. A new field rm_resend_icp
   is added to the kvmppc_icp structure for this purpose.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
10 years agoKVM: PPC: Book3S HV: Fix KSM memory corruption
Paul Mackerras [Mon, 3 Nov 2014 04:51:58 +0000 (15:51 +1100)]
KVM: PPC: Book3S HV: Fix KSM memory corruption

Testing with KSM active in the host showed occasional corruption of
guest memory.  Typically a page that should have contained zeroes
would contain values that look like the contents of a user process
stack (values such as 0x0000_3fff_xxxx_xxx).

Code inspection in kvmppc_h_protect revealed that there was a race
condition with the possibility of granting write access to a page
which is read-only in the host page tables.  The code attempts to keep
the host mapping read-only if the host userspace PTE is read-only, but
if that PTE had been temporarily made invalid for any reason, the
read-only check would not trigger and the host HPTE could end up
read-write.  Examination of the guest HPT in the failure situation
revealed that there were indeed shared pages which should have been
read-only that were mapped read-write.

To close this race, we don't let a page go from being read-only to
being read-write, as far as the real HPTE mapping the page is
concerned (the guest view can go to read-write, but the actual mapping
stays read-only).  When the guest tries to write to the page, we take
an HDSI and let kvmppc_book3s_hv_page_fault take care of providing a
writable HPTE for the page.

This eliminates the occasional corruption of shared pages
that was previously seen with KSM active.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
10 years agoKVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
Mahesh Salgaonkar [Mon, 3 Nov 2014 04:51:57 +0000 (15:51 +1100)]
KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI

When we get an HMI (hypervisor maintenance interrupt) while in a
guest, we see that guest enters into paused state.  The reason is, in
kvmppc_handle_exit_hv it falls through default path and returns to
host instead of resuming guest.  This causes guest to enter into
paused state.  HMI is a hypervisor only interrupt and it is safe to
resume the guest since the host has handled it already.  This patch
adds a switch case to resume the guest.

Without this patch we see guest entering into paused state with following
console messages:

[ 3003.329351] Severe Hypervisor Maintenance interrupt [Recovered]
[ 3003.329356]  Error detail: Timer facility experienced an error
[ 3003.329359]  HMER: 0840000000000000
[ 3003.329360]  TFMR: 4a12000980a84000
[ 3003.329366] vcpu c0000007c35094c0 (40):
[ 3003.329368] pc  = c0000000000c2ba0  msr = 8000000000009032  trap = e60
[ 3003.329370] r 0 = c00000000021ddc0  r16 = 0000000000000046
[ 3003.329372] r 1 = c00000007a02bbd0  r17 = 00003ffff27d5d98
[ 3003.329375] r 2 = c0000000010980b8  r18 = 00001fffffc9a0b0
[ 3003.329377] r 3 = c00000000142d6b8  r19 = c00000000142d6b8
[ 3003.329379] r 4 = 0000000000000002  r20 = 0000000000000000
[ 3003.329381] r 5 = c00000000524a110  r21 = 0000000000000000
[ 3003.329383] r 6 = 0000000000000001  r22 = 0000000000000000
[ 3003.329386] r 7 = 0000000000000000  r23 = c00000000524a110
[ 3003.329388] r 8 = 0000000000000000  r24 = 0000000000000001
[ 3003.329391] r 9 = 0000000000000001  r25 = c00000007c31da38
[ 3003.329393] r10 = c0000000014280b8  r26 = 0000000000000002
[ 3003.329395] r11 = 746f6f6c2f68656c  r27 = c00000000524a110
[ 3003.329397] r12 = 0000000028004484  r28 = c00000007c31da38
[ 3003.329399] r13 = c00000000fe01400  r29 = 0000000000000002
[ 3003.329401] r14 = 0000000000000046  r30 = c000000003011e00
[ 3003.329403] r15 = ffffffffffffffba  r31 = 0000000000000002
[ 3003.329404] ctr = c00000000041a670  lr  = c000000000272520
[ 3003.329405] srr0 = c00000000007e8d8 srr1 = 9000000000001002
[ 3003.329406] sprg0 = 0000000000000000 sprg1 = c00000000fe01400
[ 3003.329407] sprg2 = c00000000fe01400 sprg3 = 0000000000000005
[ 3003.329408] cr = 48004482  xer = 2000000000000000  dsisr = 42000000
[ 3003.329409] dar = 0000010015020048
[ 3003.329410] fault dar = 0000010015020048 dsisr = 42000000
[ 3003.329411] SLB (8 entries):
[ 3003.329412]   ESID = c000000008000000 VSID = 40016e7779000510
[ 3003.329413]   ESID = d000000008000001 VSID = 400142add1000510
[ 3003.329414]   ESID = f000000008000004 VSID = 4000eb1a81000510
[ 3003.329415]   ESID = 00001f000800000b VSID = 40004fda0a000d90
[ 3003.329416]   ESID = 00003f000800000c VSID = 400039f536000d90
[ 3003.329417]   ESID = 000000001800000d VSID = 0001251b35150d90
[ 3003.329417]   ESID = 000001000800000e VSID = 4001e46090000d90
[ 3003.329418]   ESID = d000080008000019 VSID = 40013d349c000400
[ 3003.329419] lpcr = c048800001847001 sdr1 = 0000001b19000006 last_inst = ffffffff
[ 3003.329421] trap=0xe60 | pc=0xc0000000000c2ba0 | msr=0x8000000000009032
[ 3003.329524] Severe Hypervisor Maintenance interrupt [Recovered]
[ 3003.329526]  Error detail: Timer facility experienced an error
[ 3003.329527]  HMER: 0840000000000000
[ 3003.329527]  TFMR: 4a12000980a94000
[ 3006.359786] Severe Hypervisor Maintenance interrupt [Recovered]
[ 3006.359792]  Error detail: Timer facility experienced an error
[ 3006.359795]  HMER: 0840000000000000
[ 3006.359797]  TFMR: 4a12000980a84000

 Id    Name                           State
----------------------------------------------------
 2     guest2                         running
 3     guest3                         paused
 4     guest4                         running

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
10 years agoKVM: PPC: Book3S HV: Fix computation of tlbie operand
Paul Mackerras [Mon, 3 Nov 2014 04:51:56 +0000 (15:51 +1100)]
KVM: PPC: Book3S HV: Fix computation of tlbie operand

The B (segment size) field in the RB operand for the tlbie
instruction is two bits, which we get from the top two bits of
the first doubleword of the HPT entry to be invalidated.  These
bits go in bits 8 and 9 of the RB operand (bits 54 and 55 in IBM
bit numbering).

The compute_tlbie_rb() function gets these bits as v >> (62 - 8),
which is not correct as it will bring in the top 10 bits, not
just the top two.  These extra bits could corrupt the AP, AVAL
and L fields in the RB value.  To fix this we shift right 62 bits
and then shift left 8 bits, so we only get the two bits of the
B field.

The first doubleword of the HPT entry is under the control of the
guest kernel.  In fact, Linux guests will always put zeroes in bits
54 -- 61 (IBM bits 2 -- 9), but we should not rely on guests doing
this.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
10 years agoKVM: PPC: Book3S HV: Add missing HPTE unlock
Aneesh Kumar K.V [Wed, 5 Nov 2014 01:21:13 +0000 (12:21 +1100)]
KVM: PPC: Book3S HV: Add missing HPTE unlock

In kvm_test_clear_dirty(), if we find an invalid HPTE we move on to the
next HPTE without unlocking the invalid one.  In fact we should never
find an invalid and unlocked HPTE in the rmap chain, but for robustness
we should unlock it.  This adds the missing unlock.

Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
10 years agoKVM: PPC: BookE: Improve irq inject tracepoint
Alexander Graf [Mon, 10 Nov 2014 23:15:53 +0000 (00:15 +0100)]
KVM: PPC: BookE: Improve irq inject tracepoint

When injecting an IRQ, we only document which IRQ priority (which translates
to IRQ type) gets injected. However, when reading traces you don't necessarily
have all the numbers in your head to know which IRQ really is meant.

This patch converts the IRQ number field to a symbolic name that is in sync
with the respective define. That way it's a lot easier for readers to figure
out what interrupt gets injected.

Signed-off-by: Alexander Graf <agraf@suse.de>
10 years agoMerge tag 'kvm-arm-for-3.19-take2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Paolo Bonzini [Mon, 15 Dec 2014 12:06:40 +0000 (13:06 +0100)]
Merge tag 'kvm-arm-for-3.19-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

Second round of changes for KVM for arm/arm64 for v3.19; fixes reboot
problems, clarifies VCPU init, and fixes a regression concerning the
VGIC init flow.

Conflicts:
arch/ia64/kvm/kvm-ia64.c [deleted in HEAD and modified in kvmarm]

10 years agoarm/arm64: KVM: Require in-kernel vgic for the arch timers
Christoffer Dall [Fri, 12 Dec 2014 20:19:23 +0000 (21:19 +0100)]
arm/arm64: KVM: Require in-kernel vgic for the arch timers

It is curently possible to run a VM with architected timers support
without creating an in-kernel VGIC, which will result in interrupts from
the virtual timer going nowhere.

To address this issue, move the architected timers initialization to the
time when we run a VCPU for the first time, and then only initialize
(and enable) the architected timers if we have a properly created and
initialized in-kernel VGIC.

When injecting interrupts from the virtual timer to the vgic, the
current setup should ensure that this never calls an on-demand init of
the VGIC, which is the only call path that could return an error from
kvm_vgic_inject_irq(), so capture the return value and raise a warning
if there's an error there.

We also change the kvm_timer_init() function from returning an int to be
a void function, since the function always succeeds.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
10 years agoarm/arm64: KVM: Initialize the vgic on-demand when injecting IRQs
Christoffer Dall [Tue, 9 Dec 2014 13:35:33 +0000 (14:35 +0100)]
arm/arm64: KVM: Initialize the vgic on-demand when injecting IRQs

Userspace assumes that it can wire up IRQ injections after having
created all VCPUs and after having created the VGIC, but potentially
before starting the first VCPU.  This can currently lead to lost IRQs
because the state of that IRQ injection is not stored anywhere and we
don't return an error to userspace.

We haven't seen this problem manifest itself yet, presumably because
guests reset the devices on boot, but this could cause issues with
migration and other non-standard startup configurations.

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
This page took 0.133136 seconds and 4 git commands to generate.