Antonio Ospite [Tue, 30 Jun 2015 21:59:03 +0000 (14:59 -0700)]
printk: improve the description of /dev/kmsg line format
The comment about /dev/kmsg does not mention the additional values which
may actually be exported, fix that.
Also move up the part of the comment instructing the users to ignore these
additional values, this way the reading is more fluent and logically
compact.
do_device_access() takes a separate parameter to indicate the direction of
data transfer, which it used to use to select the appropriate function out
of sg_pcopy_{to,from}_buffer(). However these two functions now have
So this patch makes it bypass these wrappers and call the underlying
function sg_copy_buffer() directly; this has the same calling style as
do_device_access() i.e. a separate direction-of-transfer parameter and no
pointers-to-const, so skipping the wrappers not only eliminates the
warning, it also make the code simpler :)
Dave Gordon [Tue, 30 Jun 2015 21:58:54 +0000 (14:58 -0700)]
lib/scatterlist: mark input buffer parameters as 'const'
The 'buf' parameter of sg(p)copy_from_buffer() can and should be
const-qualified, although because of the shared implementation of
_to_buffer() and _from_buffer(), we have to cast this away internally.
This means that callers who have a 'const' buffer containing the data to
be copied to the sg-list no longer have to cast away the const-ness
themselves. It also enables improved coverage by code analysis tools.
Dave Gordon [Tue, 30 Jun 2015 21:58:52 +0000 (14:58 -0700)]
lib/scatterlist.c: fix kerneldoc for sg_pcopy_{to,from}_buffer()
The kerneldoc for the functions doesn't match the code; the last two
parameters (buflen, skip) have been transposed, which is confusing,
especially as they're both integral types and the compiler won't warn
about swapping them.
These functions and the kerneldoc were introduced in commit: df642cea lib/scatterlist: introduce sg_pcopy_from_buffer() ...
Author: Akinobu Mita <[email protected]>
Date: Mon Jul 8 16:01:54 2013 -0700
The only difference between sg_pcopy_{from,to}_buffer() and
sg_copy_{from,to}_buffer() is an additional argument that
specifies the number of bytes to skip the SG list before
copying.
The functions have the extra argument at the end, but the kerneldoc
lists it in penultimate position.
Davidlohr Bueso [Tue, 30 Jun 2015 21:58:48 +0000 (14:58 -0700)]
ipc,sysv: return -EINVAL upon incorrect id/seqnum
In ipc_obtain_object_check we return -EIDRM when a bogus sequence number
is detected via ipc_checkid, while the ipc manpages state the following
return codes for such errors:
EIDRM <ID> points to a removed identifier.
EINVAL Invalid <ID> value, or unaligned, etc.
EIDRM should only be returned upon a RMID call (->deleted check), and thus
return EINVAL for wrong seq. This difference in semantics has also caused
real bugs, ie: https://bugzilla.redhat.com/show_bug.cgi?id=246509
Davidlohr Bueso [Tue, 30 Jun 2015 21:58:45 +0000 (14:58 -0700)]
ipc,sysv: make return -EIDRM when racing with RMID consistent
The ipc_lock helper is used by all forms of sysv ipc to acquire the ipc
object's spinlock. Upon error (bogus identifier), we always return
-EINVAL, whether the problem be in the idr path or because we raced with a
task performing RMID. For the later, however, all ipc related manpages,
state the that for:
EIDRM <ID> points to a removed identifier.
And return:
EINVAL Invalid <ID> value, or unaligned, etc.
Which (EINVAL) should only return once the ipc resource is deleted. For
all types of ipc this is done immediately upon a RMID command. However,
shared memory behaves slightly different as it can merely mark a segment
for deletion, and delay the actual freeing until there are no more active
consumers. Per shmctl(IPC_RMID) manpage:
""
Mark the segment to be destroyed. The segment will only actually
be destroyed after the last process detaches it (i.e., when the
shm_nattch member of the associated structure shmid_ds is zero).
""
Unlike ipc_lock, paths that behave "correctly", at least per the manpage,
involve controlling the ipc resource via *ctl(), doing the exact same
validity check as ipc_lock after right acquiring the spinlock:
if (!ipc_valid_object()) {
err = -EIDRM;
goto out_unlock;
}
Thus make ipc_lock consistent with the rest of ipc code and return -EIDRM
in ipc_lock when !ipc_valid_object().
Davidlohr Bueso [Tue, 30 Jun 2015 21:58:39 +0000 (14:58 -0700)]
ipc,msg: provide barrier pairings for lockless receive
We currently use a full barrier on the sender side to to avoid receiver
tasks disappearing on us while still performing on the sender side wakeup.
We lack however, the proper CPU-CPU interactions pairing on the receiver
side which busy-waits for the message. Similarly, we do not need a full
smp_mb, and can relax the semantics for the writer and reader sides of the
message. This is safe as we are only ordering loads and stores to r_msg.
And in both smp_wmb and smp_rmb, there are no stores after the calls
_anyway_.
This obviously applies for pipelined_send and expunge_all, for EIRDM when
destroying a queue.
Akinobu Mita [Tue, 30 Jun 2015 21:58:30 +0000 (14:58 -0700)]
arc: use for_each_sg()
This replaces the plain loop over the sglist array with for_each_sg()
macro which consists of sg_next() function calls. Since arc doesn't
select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in
order to loop over each sg element. But this can help find problems with
drivers that do not properly initialize their sg tables when
CONFIG_DEBUG_SG is enabled.
Josh Triplett [Tue, 30 Jun 2015 21:58:27 +0000 (14:58 -0700)]
devpts: if initialization failed, don't crash when opening /dev/ptmx
If devpts failed to initialize, it would store an ERR_PTR in the global
devpts_mnt. A subsequent open of /dev/ptmx would call devpts_new_index,
which would dereference devpts_mnt and crash.
Avoid storing invalid values in devpts_mnt; leave it NULL instead. Make
both devpts_new_index and devpts_pty_new fail gracefully with ENODEV in
that case, which then becomes the return value to the userspace open call
on /dev/ptmx.
Lorenzo Stoakes [Tue, 30 Jun 2015 21:57:49 +0000 (14:57 -0700)]
gcov: add support for GCC 5.1
Fix kernel gcov support for GCC 5.1. Similar to commit a992bf836f9
("gcov: add support for GCC 4.9"), this patch takes into account the
existence of a new gcov counter (see gcc's gcc/gcov-counter.def.)
Firstly, it increments GCOV_COUNTERS (to 10), which makes the data
structure struct gcov_info compatible with GCC 5.1.
Secondly, a corresponding counter function __gcov_merge_icall_topn (Top N
value tracking for indirect calls) is included in base.c with the other
gcov counters unused for kernel profiling.
HATAYAMA Daisuke [Tue, 30 Jun 2015 21:57:46 +0000 (14:57 -0700)]
kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops path
Commit f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers"
option for kdump after panic_notifers") introduced
"crash_kexec_post_notifiers" kernel boot option, which toggles wheather
panic() calls crash_kexec() before panic_notifiers and dump kmsg or after.
The problem is that the commit overlooks panic_on_oops kernel boot option.
If it is enabled, crash_kexec() is called directly without going through
panic() in oops path.
To fix this issue, this patch adds a check to "crash_kexec_post_notifiers"
in the condition of kexec_should_crash().
Also, put a comment in kexec_should_crash() to explain not obvious things
on this patch.
HATAYAMA Daisuke [Tue, 30 Jun 2015 21:57:43 +0000 (14:57 -0700)]
kernel/panic: call the 2nd crash_kexec() only if crash_kexec_post_notifiers is enabled
For compatibility with the behaviour before the commit f06e5153f4ae2e
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after
panic_notifers"), the 2nd crash_kexec() should be called only if
crash_kexec_post_notifiers is enabled.
Note that crash_kexec() returns immediately if kdump crash kernel is not
loaded, so in this case, this patch makes no functionality change, but the
point is to make it explicit, from the caller panic() side, that the 2nd
crash_kexec() does nothing.
KarimAllah Ahmed [Tue, 30 Jun 2015 21:57:39 +0000 (14:57 -0700)]
x86/kexec: prepend elfcorehdr instead of appending it to the crash-kernel command-line.
Any parameter passed after '--' in the kernel command-line will not be
parsed by the kernel at all, instead it will be passed directly to init
process.
Currently the kernel appends elfcorehdr=<paddr> to the cmdline passed from
kexec load, and if this command-line is used to pass parameters to init
process this means that 'elfcorehdr' will not be parsed as a kernel
parameter at all which will be a problem for vmcore subsystem since it
will know nothing about the location of the ELF structure!
Prepending 'elfcorehdr' instead of appending it fixes this problem since
it ensures that it always comes before '--' and so it's always parsed as a
kernel command-line parameter.
Even with this patch things can still go wrong if 'CONFIG_CMDLINE' was
also used to embedd a command-line to the crash dump kernel and this
command-line contains '--' since the current behavior of the kernel is to
actually append the boot loader command-line to the embedded command-line.
Yann Droneaud [Tue, 30 Jun 2015 21:57:33 +0000 (14:57 -0700)]
fs: allocate structure unconditionally in seq_open()
Since patch described below, from v2.6.15-rc1, seq_open() could use a
struct seq_file already allocated by the caller if the pointer to the
structure is stored in file->private_data before calling the function.
[PATCH] allow callers of seq_open do allocation themselves
Allow caller of seq_open() to kmalloc() seq_file + whatever else they
want and set ->private_data to it. seq_open() will then abstain from
doing allocation itself.
As there's no more use for such feature, as it could be easily replaced by
calls to seq_open_private() (see commit 39699037a5c9 ("[FS] seq_file:
Introduce the seq_open_private()")) and seq_release_private() (see
v2.6.0-test3), support for this uncommon feature can be removed from
seq_open().
Yann Droneaud [Tue, 30 Jun 2015 21:57:30 +0000 (14:57 -0700)]
fs: use seq_open_private() for proc_mounts
A patchset to remove support for passing pre-allocated struct seq_file to
seq_open(). Such feature is undocumented and prone to error.
In particular, if seq_release() is used in release handler, it will
kfree() a pointer which was not allocated by seq_open().
So this patchset drops support for pre-allocated struct seq_file: it's
only of use in proc_namespace.c and can be easily replaced by using
seq_open_private()/seq_release_private().
Additionally, it documents the use of file->private_data to hold pointer
to struct seq_file by seq_open().
This patch (of 3):
Since patch described below, from v2.6.15-rc1, seq_open() could use a
struct seq_file already allocated by the caller if the pointer to the
structure is stored in file->private_data before calling the function.
[PATCH] allow callers of seq_open do allocation themselves
Allow caller of seq_open() to kmalloc() seq_file + whatever else they
want and set ->private_data to it. seq_open() will then abstain from
doing allocation itself.
Such behavior is only used by mounts_open_common().
In order to drop support for such uncommon feature, proc_mounts is
converted to use seq_open_private(), which take care of allocating the
proc_mounts structure, making it available through ->private in struct
seq_file.
Conversely, proc_mounts is converted to use seq_release_private(), in
order to release the private structure allocated by seq_open_private().
Then, ->private is used directly instead of proc_mounts() macro to access
to the proc_mounts structure.
Mel Gorman [Tue, 30 Jun 2015 21:57:27 +0000 (14:57 -0700)]
mm: meminit: finish initialisation of struct pages before basic setup
Waiman Long reported that 24TB machines hit OOM during basic setup when
struct page initialisation was deferred. One approach is to initialise
memory on demand but it interferes with page allocator paths. This patch
creates dedicated threads to initialise memory before basic setup. It
then blocks on a rw_semaphore until completion as a wait_queue and counter
is overkill. This may be slower to boot but it's simplier overall and
also gets rid of a section mangling which existed so kswapd could do the
initialisation.
Mel Gorman [Tue, 30 Jun 2015 21:57:23 +0000 (14:57 -0700)]
mm: meminit: remove mminit_verify_page_links
mminit_verify_page_links() is an extremely paranoid check that was
introduced when memory initialisation was being heavily reworked.
Profiles indicated that up to 10% of parallel memory initialisation was
spent on checking this for every page. The cost could be reduced but in
practice this check only found problems very early during the
initialisation rewrite and has found nothing since. This patch removes an
expensive unnecessary check.
Mel Gorman [Tue, 30 Jun 2015 21:57:20 +0000 (14:57 -0700)]
mm: meminit: reduce number of times pageblocks are set during struct page init
During parallel sturct page initialisation, ranges are checked for every
PFN unnecessarily which increases boot times. This patch alters when the
ranges are checked.
Mel Gorman [Tue, 30 Jun 2015 21:57:09 +0000 (14:57 -0700)]
mm: meminit: minimise number of pfn->page lookups during initialisation
Deferred struct page initialisation is using pfn_to_page() on every PFN
unnecessarily. This patch minimises the number of lookups and scheduler
checks.
Mel Gorman [Tue, 30 Jun 2015 21:57:05 +0000 (14:57 -0700)]
mm: meminit: initialise remaining struct pages in parallel with kswapd
Only a subset of struct pages are initialised at the moment. When this
patch is applied kswapd initialise the remaining struct pages in parallel.
This should boot faster by spreading the work to multiple CPUs and
initialising data that is local to the CPU. The user-visible effect on
large machines is that free memory will appear to rapidly increase early
in the lifetime of the system until kswapd reports that all memory is
initialised in the kernel log. Once initialised there should be no other
user-visibile effects.
Mel Gorman [Tue, 30 Jun 2015 21:57:02 +0000 (14:57 -0700)]
mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set
This patch initalises all low memory struct pages and 2G of the highest
zone on each node during memory initialisation if
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. That config option cannot be set
but will be available in a later patch. Parallel initialisation of struct
page depends on some features from memory hotplug and it is necessary to
alter alter section annotations.
Mel Gorman [Tue, 30 Jun 2015 21:56:59 +0000 (14:56 -0700)]
mm: meminit: inline some helper functions
early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
unnecessarily visible outside memory initialisation. As well as
unnecessary visibility, it's unnecessary function call overhead when
initialising pages. This patch moves the helpers inline.
Mel Gorman [Tue, 30 Jun 2015 21:56:55 +0000 (14:56 -0700)]
mm: meminit: make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nid
__early_pfn_to_nid() use static variables to cache recent lookups as
memblock lookups are very expensive but it assumes that memory
initialisation is single-threaded. Parallel initialisation of struct
pages will break that assumption so this patch makes __early_pfn_to_nid()
SMP-safe by requiring the caller to cache recent search information.
early_pfn_to_nid() keeps the same interface but is only safe to use early
in boot due to the use of a global static variable. meminit_pfn_in_nid()
is an SMP-safe version that callers must maintain their own state for.
Mel Gorman [Tue, 30 Jun 2015 21:56:52 +0000 (14:56 -0700)]
mm: page_alloc: pass PFN to __free_pages_bootmem
__free_pages_bootmem prepares a page for release to the buddy allocator
and assumes that the struct page is initialised. Parallel initialisation
of struct pages defers initialisation and __free_pages_bootmem can be
called for struct pages that cannot yet map struct page to PFN. This
patch passes PFN to __free_pages_bootmem with no other functional change.
Nathan Zimmer [Tue, 30 Jun 2015 21:56:48 +0000 (14:56 -0700)]
mm: meminit: only set page reserved in the memblock region
Currently each page struct is set as reserved upon initialization. This
patch leaves the reserved bit clear and only sets the reserved bit when it
is known the memory was allocated by the bootmem allocator. This makes it
easier to distinguish between uninitialised struct pages and reserved
struct pages in later patches.
Robin Holt [Tue, 30 Jun 2015 21:56:45 +0000 (14:56 -0700)]
mm: meminit: move page initialization into a separate function
Currently, memmap_init_zone() has all the smarts for initializing a single
page. A subset of this is required for parallel page initialisation and
so this patch breaks up the monolithic function in preparation.
Robin Holt [Tue, 30 Jun 2015 21:56:41 +0000 (14:56 -0700)]
memblock: introduce a for_each_reserved_mem_region iterator
Struct page initialisation had been identified as one of the reasons why
large machines take a long time to boot. Patches were posted a long time ago
to defer initialisation until they were first used. This was rejected on
the grounds it should not be necessary to hurt the fast paths. This series
reuses much of the work from that time but defers the initialisation of
memory to kswapd so that one thread per node initialises memory local to
that node.
After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine
[ 7.383764] kswapd 0 initialised deferred memory in 188ms
[ 7.404253] kswapd 1 initialised deferred memory in 208ms
[ 7.411044] kswapd 3 initialised deferred memory in 216ms
[ 7.411551] kswapd 2 initialised deferred memory in 216ms
On a 1TB machine, I see
[ 8.406511] kswapd 3 initialised deferred memory in 1116ms
[ 8.428518] kswapd 1 initialised deferred memory in 1140ms
[ 8.435977] kswapd 0 initialised deferred memory in 1148ms
[ 8.437416] kswapd 2 initialised deferred memory in 1148ms
Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again. In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 16 seconds.
Nate Zimmer said:
: On an older 8 TB box with lots and lots of cpus the boot time, as
: measure from grub to login prompt, the boot time improved from 1484
: seconds to exactly 1000 seconds.
Waiman Long said:
: I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system. From
: grub menu to ssh login, the bootup time was 453s before the patch and 265s
: after the patch - a saving of 188s (42%).
Daniel Blueman said:
: On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're seeing
: stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction with
: this patchset. Non-temporal PMD init (https://lkml.org/lkml/2015/4/23/350)
: drops this to 1045s.
This patch (of 13):
As part of initializing struct page's in 2MiB chunks, we noticed that at
the end of free_all_bootmem(), there was nothing which had forced the
reserved/allocated 4KiB pages to be initialized.
This helper function will be used for that expansion.
Tedd Ho-Jeong An [Tue, 30 Jun 2015 18:43:40 +0000 (11:43 -0700)]
Bluetooth: Reinitialize the list after deletion for session user list
If the user->list is deleted with list_del(), it doesn't initialize the
entry which can cause the issue with list_empty(). According to the
comment from the list.h, list_empty() returns false even if the list is
empty and put the entry in an undefined state.
/**
* list_del - deletes entry from list.
* @entry: the element to delete from the list.
* Note: list_empty() on entry does not return true after this, the entry is
* in an undefined state.
*/
Because of this behavior, list_empty() returns false even if list is empty
when the device is reconnected.
So, user->list needs to be re-initialized after list_del(). list.h already
have a macro list_del_init() which deletes the entry and initailze it again.
Craig Gallek [Tue, 30 Jun 2015 16:49:32 +0000 (12:49 -0400)]
sock_diag: don't broadcast kernel sockets
Kernel sockets do not hold a reference for the network namespace to
which they point. Socket destruction broadcasting relies on the
network namespace and will cause the splat below when a kernel socket
is destroyed.
This fix simply ignores kernel sockets when they are destroyed.
Tested:
Using a debug kernel while 'ss -E' is running:
ip netns add test-ns
ip netns delete test-ns
Fixes: eb4cb008529c sock_diag: define destruction multicast groups Fixes: 26abe14379f8 net: Modify sk_alloc to not reference count the
netns of kernel sockets. Reported-by: Dave Jones <[email protected]> Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
David S. Miller [Tue, 30 Jun 2015 16:37:10 +0000 (09:37 -0700)]
Merge branch 'mvneta-jumbo-frames'
Simon Guinot says:
====================
Fix Ethernet jumbo frames support for Armada 370 and 38x
This patch series fixes the Ethernet jumbo frames support for the SoCs
Armada 370, 380 and 385. Unlike Armada XP, the Ethernet controller for
this SoCs don't support TCP/IP checksumming with a frame size larger
than 1600 bytes.
This patches should be applied to the -stable kernels 3.8 and onwards.
Changes since v1:
- Use a new compatible string for the Ethernet IP found in Armada XP
SoCs (instead of using an optional property).
- Fix the issue for the Armada 380 and 385 SoCs as well.
Changes since v2:
- Add Acked-by from Gregory Clement.
- Add "Fixes:" tag to each commits.
Changes since v3:
- Fix patch 3 name: replace prefix "ARM: mvebu:" with "net: mvneta:".
====================
Simon Guinot [Tue, 30 Jun 2015 14:20:22 +0000 (16:20 +0200)]
net: mvneta: disable IP checksum with jumbo frames for Armada 370
The Ethernet controller found in the Armada 370, 380 and 385 SoCs don't
support TCP/IP checksumming with frame sizes larger than 1600 bytes.
This patch fixes the issue by disabling the features NETIF_F_IP_CSUM and
NETIF_F_TSO for the Armada 370 and compatibles SoCs when the MTU is set
to a value greater than 1600 bytes.
The mvneta driver supports the Ethernet IP found in the Armada 370, XP,
380 and 385 SoCs. Since at least one more hardware feature is available
for the Armada XP SoCs then a way to identify them is needed.
This patch introduces a new compatible string "marvell,armada-xp-neta".
of/irq: Rename "intc_desc" to "of_intc_desc" to fix OF on sh
Now CONFIG_OF can be enabled on sh:
drivers/of/irq.c:472:8: error: redefinition of 'struct intc_desc'
include/linux/sh_intc.h:109:8: note: originally defined here
As "intc_desc" is used all over the place in sh platform code, while
drivers/of/irq.c has a local definition used in a single function,
rename the latter by prefixing it with "of_".
Jeremy Linton [Mon, 29 Jun 2015 23:50:55 +0000 (18:50 -0500)]
of/irq: Fix pSeries boot failure
of_irq_parse_raw() needs to return the correct interrupt controller
node when an interrupt-map property doesn't exist.
It allows of_irq_parse_raw() to return the node pointer of the interrupt
controller, rather than the parent bus. This allows ics_rtas_host_match()
to detect that the controller is a legacy 8259 and avoid using xics.
This avoids an RTAS assertion/crash during early kernel bootstrapping.
netfilter: don't pull include/linux/netfilter.h from netns headers
The issue is that definitions in linux/in.h overlap with those
in netinet/in.h. This patch solves this by introducing the same
mechanism as was used to solve the same problem with linux/in6.h
Will Deacon [Mon, 29 Jun 2015 16:47:42 +0000 (17:47 +0100)]
iommu/arm-smmu: Fix broken ATOS check
Commit 83a60ed8f0b5 ("iommu/arm-smmu: fix ARM_SMMU_FEAT_TRANS_OPS
condition") accidentally negated the ID0_ATOSNS predicate in the ATOS
feature check, causing the driver to attempt ATOS requests on SMMUv2
hardware without the ATOS feature implemented.
This patch restores the predicate to the correct value.
Joerg Roedel [Mon, 29 Jun 2015 08:16:08 +0000 (10:16 +0200)]
iommu: Ignore -ENODEV errors from add_device call-back
The -ENODEV error just means that the device is not
translated by an IOMMU. We shouldn't bail out of iommu
driver initialization when that happens, as this is a common
scenario on ARM.
Not returning -ENODEV in the drivers would be a bad idea, as
the IOMMU core would have no indication whether a device is
translated or not. This indication is not used at the
moment, but will probably be in the future.
Sachin Prabhu [Tue, 16 Jun 2015 15:36:17 +0000 (16:36 +0100)]
cifs: Unset CIFS_MOUNT_POSIX_PATHS flag when following dfs mounts
In a dfs setup where the client transitions from a server which supports
posix paths to a server which doesn't support posix paths, the flag
CIFS_MOUNT_POSIX_PATHS is not reset. This leads to the wrong directory
separator being used causing smb commands to fail.
Consider the following case where a dfs share on a samba server points
to a share on windows smb server.
# mount -t cifs -o .. //vm140-31/dfsroot/testwin/
# ls -l /mnt; touch /mnt/a
total 0
touch: cannot touch ‘/mnt/a’: No such file or directory
Linus Torvalds [Mon, 29 Jun 2015 18:10:56 +0000 (11:10 -0700)]
Merge tag 'md/4.2' of git://neil.brown.name/md
Pull md updates from Neil Brown:
"A mixed bag
- a few bug fixes
- some performance improvement that decrease lock contention
- some clean-up
Nothing major"
* tag 'md/4.2' of git://neil.brown.name/md:
md: clear Blocked flag on failed devices when array is read-only.
md: unlock mddev_lock on an error path.
md: clear mddev->private when it has been freed.
md: fix a build warning
md/raid5: ignore released_stripes check
md/raid5: per hash value and exclusive wait_for_stripe
md/raid5: split wait_for_stripe and introduce wait_for_quiescent
wait: introduce wait_event_exclusive_cmd
md: convert to kstrto*()
md/raid10: make sync_request_write() call bio_copy_data()
This patch restores the slab creation sequence that was broken by commit 4066c33d0308f8 and also reverts the portions that introduced the
KMALLOC_LOOP_XXX macros. Those can never really work since the slab creation
is much more complex than just going from a minimum to a maximum number.
The latest upstream kernel boots cleanly on my machine with a 64 bit x86
configuration under KVM using either SLAB or SLUB.
Linus Torvalds [Mon, 29 Jun 2015 17:34:42 +0000 (10:34 -0700)]
Merge tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm
Pull libnvdimm subsystem from Dan Williams:
"The libnvdimm sub-system introduces, in addition to the
libnvdimm-core, 4 drivers / enabling modules:
NFIT:
Instantiates an "nvdimm bus" with the core and registers memory
devices (NVDIMMs) enumerated by the ACPI 6.0 NFIT (NVDIMM Firmware
Interface table).
After registering NVDIMMs the NFIT driver then registers "region"
devices. A libnvdimm-region defines an access mode and the
boundaries of persistent memory media. A region may span multiple
NVDIMMs that are interleaved by the hardware memory controller. In
turn, a libnvdimm-region can be carved into a "namespace" device and
bound to the PMEM or BLK driver which will attach a Linux block
device (disk) interface to the memory.
PMEM:
Initially merged in v4.1 this driver for contiguous spans of
persistent memory address ranges is re-worked to drive
PMEM-namespaces emitted by the libnvdimm-core.
In this update the PMEM driver, on x86, gains the ability to assert
that writes to persistent memory have been flushed all the way
through the caches and buffers in the platform to persistent media.
See memcpy_to_pmem() and wmb_pmem().
BLK:
This new driver enables access to persistent memory media through
"Block Data Windows" as defined by the NFIT. The primary difference
of this driver to PMEM is that only a small window of persistent
memory is mapped into system address space at any given point in
time.
Per-NVDIMM windows are reprogrammed at run time, per-I/O, to access
different portions of the media. BLK-mode, by definition, does not
support DAX.
BTT:
This is a library, optionally consumed by either PMEM or BLK, that
converts a byte-accessible namespace into a disk with atomic sector
update semantics (prevents sector tearing on crash or power loss).
The sinister aspect of sector tearing is that most applications do
not know they have a atomic sector dependency. At least today's
disk's rarely ever tear sectors and if they do one almost certainly
gets a CRC error on access. NVDIMMs will always tear and always
silently. Until an application is audited to be robust in the
presence of sector-tearing the usage of BTT is recommended.
Thanks to: Ross Zwisler, Jeff Moyer, Vishal Verma, Christoph Hellwig,
Ingo Molnar, Neil Brown, Boaz Harrosh, Robert Elliott, Matthew Wilcox,
Andy Rudoff, Linda Knippers, Toshi Kani, Nicholas Moulin, Rafael
Wysocki, and Bob Moore"
* tag 'libnvdimm-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm: (33 commits)
arch, x86: pmem api for ensuring durability of persistent memory updates
libnvdimm: Add sysfs numa_node to NVDIMM devices
libnvdimm: Set numa_node to NVDIMM devices
acpi: Add acpi_map_pxm_to_online_node()
libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only
pmem: flag pmem block devices as non-rotational
libnvdimm: enable iostat
pmem: make_request cleanups
libnvdimm, pmem: fix up max_hw_sectors
libnvdimm, blk: add support for blk integrity
libnvdimm, btt: add support for blk integrity
fs/block_dev.c: skip rw_page if bdev has integrity
libnvdimm: Non-Volatile Devices
tools/testing/nvdimm: libnvdimm unit test infrastructure
libnvdimm, nfit, nd_blk: driver for BLK-mode access persistent memory
nd_btt: atomic sector updates
libnvdimm: infrastructure for btt devices
libnvdimm: write blk label set
libnvdimm: write pmem label set
libnvdimm: blk labels and namespace instantiation
...
Milo Kim [Mon, 29 Jun 2015 00:39:14 +0000 (17:39 -0700)]
leds:lp55xx: fix firmware loading error
LP55xx driver uses not firmware file but raw data to load program through
the firmware interface.(Documents/leds/leds-lp55xx.txt)
For example, here is how to run blinking green channel pattern.
(The second engine is seleted and MUX is mapped to 'RGB' mode)
echo 2 > /sys/bus/i2c/devices/xxxx/select_engine
echo "RGB" > /sys/bus/i2c/devices/xxxx/engine_mux
echo 1 > /sys/class/firmware/lp5562/loading
echo "4000600040FF6000" > /sys/class/firmware/lp5562/data
echo 0 > /sys/class/firmware/lp5562/loading
echo 1 > /sys/bus/i2c/devices/xxxx/run_engine
However, '/sys/class/firmware/<device name>' is not created after the
firmware loader user helper was introduced.
This feature is used in the case below.
As soon as the firmware download is requested by the driver, firmware
class subsystem tries to find the binary file.
If it gets failed, then it just falls back to user helper to load
raw data manually. Here, you can see the device file under
/sys/class/firmware/.
To make it happen, LP55xx driver requires two configurations.
1. Enable CONFIG_FW_LOADER_USER_HELPER_FALLBACK in Kconfig
2. Set option, 'FW_OPT_USERHELPER' on requesting the firmware data.
It means the second option should be 'false' in
request_firmware_nowait().
This option enables to load firmware data manually by calling
fw_load_from_user_helper().
Jacek Anaszewski [Mon, 29 Jun 2015 14:45:23 +0000 (07:45 -0700)]
leds: fix max77693-led build errors
Fix build errors when LEDS_MAX77693=y and V4L2_FLASH_LED_CLASS=m
by restricting LEDS_MAX77693 to =m if V4L2_FLASH_LED_CLASS=m.
drivers/leds/leds-max77693.c:1062: undefined reference to `v4l2_flash_release'
drivers/leds/leds-max77693.c:1068: undefined reference to `v4l2_flash_release'
drivers/built-in.o: In function `max77693_register_led':
drivers/leds/leds-max77693.c:968: undefined reference to `v4l2_flash_init'
drivers/built-in.o: In function `max77693_led_probe':
drivers/leds/leds-max77693.c:1048: undefined reference to `v4l2_flash_release'
Colin Ian King [Mon, 29 Jun 2015 16:10:22 +0000 (17:10 +0100)]
ALSA: Fix uninintialized error return
Static analysis with cppcheck found the following error:
[sound/core/init.c:118]: (error) Uninitialized variable: err
..this was introduced by commit 2471b6c80a70e80de69f5ff4c37187c3912e5874
("ALSA: info: Register proc entries recursively, too") where the call
to snd_info_card_register was removed and no longer setting the error
return in err. When snd_info_create_card_entry fails to allocate a
an entry, the error path exits with garbage in err. Fix is to return
-ENOMEM if entry fails to be allocated.
Fixes: 2471b6c80a ("ALSA: info: Register proc entries recursively, too") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
Randy Dunlap [Fri, 26 Jun 2015 19:00:20 +0000 (12:00 -0700)]
leds: fix aat1290 build errors
Fix build errors when LEDS_AAT1290=y and V4L2_FLASH_LED_CLASS=m
by restricting LEDS_AAT1290 to =m if V4L2_FLASH_LED_CLASS=m.
drivers/built-in.o: In function `aat1290_led_remove':
leds-aat1290.c:(.text+0xe5d77): undefined reference to `v4l2_flash_release'
drivers/built-in.o: In function `aat1290_led_probe':
leds-aat1290.c:(.text+0xe6494): undefined reference to `v4l2_flash_init'
Linus Torvalds [Mon, 29 Jun 2015 16:44:45 +0000 (09:44 -0700)]
Merge tag 'dmaengine-4.2-rc1' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine updates from Vinod Koul:
"This time we have support for few new devices, few new features and
odd fixes spread thru the subsystem.
New devices added:
- support for CSRatlas7 dma controller
- Allwinner H3(sun8i) controller
- TI DMA crossbar driver on DRA7x
- new pxa driver
New features added:
- memset support is bought back now that we have a user in xdmac controller
- interleaved transfers support different source and destination strides
- supporting DMA routers and configuration thru DT
- support for reusing descriptors
- xdmac memset and interleaved transfer support
- hdmac support for interleaved transfers
- omap-dma support for memcpy
Others:
- Constify platform_device_id
- mv_xor fixes and improvements"
* tag 'dmaengine-4.2-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (46 commits)
dmaengine: xgene: fix file permission
dmaengine: fsl-edma: clear pending interrupts on initialization
dmaengine: xdmac: Add memset support
Documentation: dmaengine: document DMA_CTRL_ACK
dmaengine: virt-dma: don't always free descriptor upon completion
dmaengine: Revert "drivers/dma: remove unused support for MEMSET operations"
dmaengine: hdmac: Implement interleaved transfers
dmaengine: Move icg helpers to global header
dmaengine: mv_xor: improve descriptors list handling and reduce locking
dmaengine: mv_xor: Enlarge descriptor pool size
dmaengine: mv_xor: add support for a38x command in descriptor mode
dmaengine: mv_xor: Rename function for consistent naming
dmaengine: mv_xor: bug fix for racing condition in descriptors cleanup
dmaengine: pl330: fix wording in mcbufsz message
dmaengine: sirf: add CSRatlas7 SoC support
dmaengine: xgene-dma: Fix "incorrect type in assignement" warnings
dmaengine: fix kernel-doc documentation
dmaengine: pxa_dma: add support for legacy transition
dmaengine: pxa_dma: add debug information
dmaengine: pxa: add pxa dmaengine driver
...
sctp: Fix race between OOTB responce and route removal
There is NULL pointer dereference possible during statistics update if the route
used for OOTB responce is removed at unfortunate time. If the route exists when
we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
ABORT, but in the meantime route is removed under our feet, we take "no_route"
path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
sctp_transport_set_owner() and therefore there is no asoc associated with this
packet. Probably temporary asoc just for OOTB responces is overkill, so just
introduce a check like in all other places in sctp_packet_transmit(), where
"asoc" is dereferenced.
To reproduce this, one needs to
0. ensure that sctp module is loaded (otherwise ABORT is not generated)
1. remove default route on the machine
2. while true; do
ip route del [interface-specific route]
ip route add [interface-specific route]
done
3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
responce
Tom Lendacky [Mon, 29 Jun 2015 16:22:12 +0000 (11:22 -0500)]
amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer allocation
When allocating Rx related buffers, alloc_pages is called using an order
number that is decreased until successful. A system under stress can
experience failures during this allocation process resulting in a warning
being issued. This message can be of concern to end users even though the
failure is not fatal. Since the failure is not fatal and can occur
multiple times, the driver should include the __GFP_NOWARN flag to
suppress the warning message from being issued.
Linus Torvalds [Mon, 29 Jun 2015 16:25:14 +0000 (09:25 -0700)]
Merge tag 'please-pull-misc-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull ia64 updates from Tony Luck:
"Pair of ia64 cleanups"
* tag 'please-pull-misc-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
ia64: Use setup_timer
ia64: export flush_icache_range for module use
Linus Torvalds [Mon, 29 Jun 2015 16:11:10 +0000 (09:11 -0700)]
Merge tag 'linux-kselftest-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest update from Shuah Khan:
"This update adds two new test suites: futex and seccomp.
In addition, it includes fixes for bugs in timers, other tests, and
compile framework. It introduces new quicktest feature to enable
users to choose to run tests that complete in a short time"
* tag 'linux-kselftest-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: add quicktest support
selftests: add seccomp suite
selftest, x86: fix incorrect comment
tools selftests: Fix 'clean' target with make 3.81
selftests/futex: Add .gitignore
kselftest: Add exit code defines
selftests: Add futex tests to the top-level Makefile
selftests/futex: Increment ksft pass and fail counters
selftests/futex: Update Makefile to use lib.mk
selftests: Add futex functional tests
kselftests: timers: Check _ALARM clockids are supported before suspending
kselftests: timers: Ease alarmtimer-suspend unreasonable latency value
kselftests: timers: Increase delay between suspends in alarmtimer-suspend
selftests/exec: do not install subdir as it is already created
selftests/ftrace: install test.d
selftests: copy TEST_DIRS to INSTALL_PATH
Test compaction of mlocked memory
selftests/mount: output WARN messages when mount test skipped
selftests/timers: Make git ignore all binaries in timers test suite
Currently, watchdog subsystem require the misc subsystem to
register a watchdog. This may not be the case in case of an
early registration of a watchdog, which can be required when
the watchdog cannot be disabled.
This patch introduces a deferral mechanism to remove this requirement.
Tadeusz Struk [Sat, 27 Jun 2015 06:56:38 +0000 (15:56 +0900)]
crypto: aesni - fix failing setkey for rfc4106-gcm-aesni
rfc4106(gcm(aes)) uses ctr(aes) to generate hash key. ctr(aes) needs
chainiv, but the chainiv gets initialized after aesni_intel when both
are statically linked so the setkey fails.
This patch forces aesni_intel to be initialized after chainiv.
Markus Elfring [Fri, 26 Jun 2015 18:30:11 +0000 (20:30 +0200)]
crypto: qat - Deletion of unnecessary checks before two function calls
The functions kfree() and release_firmware() test whether their argument
is NULL and then return immediately.
Thus the test around the calls is not needed.
This issue was detected by using the Coccinelle software.
Takashi Iwai [Mon, 29 Jun 2015 06:38:02 +0000 (08:38 +0200)]
ALSA: hda - Fix the dock headphone output on Fujitsu Lifebook E780
Fujitsu Lifebook E780 sets the sequence number 0x0f to only only of
the two headphones, thus the driver tries to assign another as the
line-out, and this results in the inconsistent mapping between the
created jack ctl and the actual I/O. Due to this, PulseAudio doesn't
handle it properly and gets the silent output.
This patch series fixes occasional BCM7xxx PHY driver binding failure due
to a harware bug where the first read or write does not come out of the PHY
MDIO management controller.
Since we have two different MDIO controllers using this PHY, a similar
need to be replicated in GENET and UniMAC MDIO.
====================
Florian Fainelli [Fri, 26 Jun 2015 17:39:06 +0000 (10:39 -0700)]
net: phy: mdio-bcm-unimac: workaround initial read failures for integrated PHYs
All BCM7xxx integrated Gigabit PHYs have an issue in their MDIO
management controller which will make the initial read or write to them
to fail and return 0xffff. This is a real issue as the typical first
thing we do is read from MII_PHYSID1 and MII_PHYSID2 from get_phy_id()
to register a driver for these PHYs.
Coupled with the workaround in drivers/net/phy/bcm7xxx.c, this
workaround for the MDIO bus controller consists in scanning the list of
PHYs to do this initial read workaround for as part of the MDIO bus
reset routine which is invoked prior to mdiobus_scan().
Once we have a proper PHY driver/device registered, all workarounds are
located there (e.g: power management suspend/resume calls).
Florian Fainelli [Fri, 26 Jun 2015 17:39:05 +0000 (10:39 -0700)]
net: bcmgenet: workaround initial read failures for integrated PHYs
All BCM7xxx integrated Gigabit PHYs have an issue in their MDIO
management controller which will make the initial read or write to them
to fail and return 0xffff. This is a real issue as the typical first
thing we do is read from MII_PHYSID1 and MII_PHYSID2 from get_phy_id()
to register a driver for these PHYs.
Coupled with the workaround in drivers/net/phy/bcm7xxx.c, this
workaround for the MDIO bus controller consists in scanning the list of
PHYs to do this initial read workaround for as part of the MDIO bus
reset routine which is invoked prior to mdiobus_scan().
Once we have a proper PHY driver/device registered, all workarounds are
located there (e.g: power management suspend/resume calls).
The initial MDIO read or write towards the BCM7xxx integrated PHY may
fail, workaround this by inserting a dummy MII_BMSR read to force the
MDIO management controller to see at least one valid transaction and get
out of stuck state out of reset.
Michal Schmidt [Fri, 26 Jun 2015 15:50:00 +0000 (17:50 +0200)]
bnx2x: fix DMA API usage
With CONFIG_DMA_API_DEBUG=y bnx2x triggers the error "DMA-API: device
driver frees DMA memory with wrong function".
On archs where PAGE_SIZE > SGE_PAGE_SIZE it also triggers "DMA-API:
device driver frees DMA memory with different size".
Fix this by making the mapping and unmapping symmetric:
- Do not map the whole pool page at once. Instead map the
SGE_PAGE_SIZE-sized pieces individually, so they can be unmapped in
the same manner.
- What's mapped using dma_map_page() must be unmapped using
dma_unmap_page().
Tested on ppc64.
Fixes: 4cace675d687 ("bnx2x: Alloc 4k fragment for each rx ring buffer element") Signed-off-by: Michal Schmidt <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Before, the symbols depended implicitly on HAS_DMA through PCI or
USE_OF. Add explicit dependencies on HAS_DMA to fix this.
Fixes: b7d3282a245f4428 ("net: via/Kconfig: replace USE_OF with OF_???") Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]>
Steve French [Thu, 18 Jun 2015 09:49:47 +0000 (04:49 -0500)]
Update negotiate protocol for SMB3.11 dialect
Send negotiate contexts when SMB3.11 dialect is negotiated
(ie the preauth and the encryption contexts) and
Initialize SMB3.11 preauth negotiate context salt to random bytes
Followon patch will update session setup and tree connect
Steve French [Sun, 28 Jun 2015 04:18:36 +0000 (21:18 -0700)]
Add reflink copy over SMB3.11 with new FSCTL_DUPLICATE_EXTENTS
Getting fantastic copy performance with cp --reflink over SMB3.11
using the new FSCTL_DUPLICATE_EXTENTS.
This FSCTL was added in the SMB3.11 dialect (testing was
against REFS file system) so have put it as a 3.11 protocol
specific operation ("vers=3.1.1" on the mount). Tested at
the SMB3 plugfest in Redmond.
It depends on the new FS Attribute (BLOCK_REFCOUNTING) which
is used to advertise support for the ability to do this ioctl
(if you can support multiple files pointing to the same block
than this refcounting ability or equivalent is needed to
support the new reflink-like duplicate extent SMB3 ioctl.
David S. Miller [Mon, 29 Jun 2015 00:20:02 +0000 (17:20 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-06-26
This series contains fixes for igb, e1000e and i40evf.
Todd disables IPv6 extension header processing due to a hardware errata
and bumps the driver version.
Yanir provides six fixes for e1000e. First is a fix for a locking issue
where we were not always taking the pci_bus_sem semaphore all the time
when calling pci_disable_link_state_locked(), so fix the code to only call
pci_disable_link_state_locked() when the semaphore has been acquired,
otherwise call pci_disable_link_state(). A previous fix for i219 where
the hardware prevented ULP entry caused EEE in Sx not the be enabled, so
modify the code flow that allows both ULP and EEE in Sx. Fix an issue
when running 10/100 full duplex on i219 where CRC errors were occurring
by increasing the IPG from 8 to 0xC as per the hardware developers.
Fix a data corruption issue found on some platforms by increasing the
minimum gap between the PHY FIFO read and write pointers. Fix i219,
which does not require the K1 workaround for LPT devices.
Mitch provides a i40evf fix for a panic when changing MTU. Down was
requesting queue disables, but then exited immediately without waiting
for the queues to actually be disabled. This could allow any function
called after i40evf_down() to run immediately, including i40evf_up(),
and causes a memory leak. Fixed the issue by removing the whole
reinit_locked function which allows for the driver to handle the state
changes by requesting reset from the periodic timer. The second fix
resolves an issue where RSS was being configured as though it is using
the maximum number of queue. This can cause the device to drop a lot
of receive traffic, as the packets get assigned to non-functional queues.
This is resolved by only configuring RSS with the number of active queues.
====================