Git Repo - linux.git/log

quota: optimize mark_dirty logic

- Skip locking if quota is dirty already.
- Return old quota state to help fs-specciffic implementation to optimize
case where quota was dirty already.

Signed-off-by: Dmitry Monakhov <[email protected]>
Signed-off-by: Jan Kara <[email protected]>

ext2: Avoid loading bitmaps for full groups during block allocation

There is no point in loading bitmap for groups which are completely full.
This causes noticeable performance problems (and memory pressure) on small
systems with large full filesystem
(http://marc.info/?l=linux-ext4&m=126843108314310&w=2).

Port of the same ext3 patch.

Signed-off-by: Jan Kara <[email protected]>

ext3: Avoid loading bitmaps for full groups during block allocation

There is no point in loading bitmap for groups which are completely full.
This causes noticeable performance problems (and memory pressure) on small
systems with large full filesystem
(http://marc.info/?l=linux-ext4&m=126843108314310&w=2).

Jan Kara: Added a comment and changed check to use cpu-endian value.

Signed-off-by: "Frans van de Wiel" <[email protected]>
Signed-off-by: Jan Kara <[email protected]>

Fix networking tree iscsi_tcp.c mis-merge

The removal of the 'waitqueue_active()' test in commit d7d05548a6
("[SCSI] iscsi_tcp: fix relogin/shutdown hang") got incorrectly resolved
by David when he back-merged the main git tree into the networking tree
in commit 278554bd65 ("Merge branch 'master' of master.kernel.org:...").

There was a content conflict due to 'sock->sk->sk_sleep' being changed
into 'sk_sleep(sock->sk)' in the networking tree, but David didn't pick
up the iscsi change from the main tree.

Reported-by: James Bottomley <[email protected]>
Cc: David Miller <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

i2c-nforce2: Remove redundant error messages on ACPI conflict

The ACPI subsystem strictly checks for resource conflicts. When there's
a conflict, it outputs a warning message with all the details needed to
properly diagnose the underlying issue. However, the i2c-nforce2 driver
also prints its own message. Not only is the message redundant, it is at
the KERN_ERR level, which overrides some bootsplash screens for no good
reason. This change removes the two lines that print out the error
messages.

Signed-off-by: Chase Douglas <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>

i2c: Use <linux/io.h> instead of <asm/io.h>

As warned by checkpatch.pl, <linux/io.h> should be used instead of
<asm/io.h>.

Signed-off-by: H Hartley Sweeten <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>

i2c-algo-pca: Fix coding style issues

Fix up some coding style issues. i2c-algo-pca.c has been built
successfully after applying this patch and the binary object is
still exactly the same. Other issues found by checkpatch.pl were
voluntarily not fixed, either to keep readability, or because of
false positive errors.

Signed-off-by: Farid Hammane <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>

i2c-dev: Fix all coding style issues

Fix all coding style issues found by checkpatch.pl.

Signed-off-by: Farid Hammane <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>

i2c-core: Fix some coding style issues

Fix up coding style issues found by the checkpatch.pl tool.

Signed-off-by: Farid Hammane <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>

i2c-gpio: Move initialization code to subsys_initcall()

GPIO driven I2C bus can be used for controlling the PMIC chip. The
example of such configuration is Samsung Aquila board.

This patch moves initialization code to subsys_initcall() to ensure
that the i2c bus is available early so the regulators can be quickly
probed and available for other devices on their probe() call.

Such solution has been proposed by Mark Brown to fix the problem of
the regulators not beeing available on the peripheral device probe():
http://lists.infradead.org/pipermail/linux-arm-kernel/2010-March/011971.html

Cc: Mark Brown <[email protected]>
Reviewed-by: Kyungmin Park <[email protected]>
Signed-off-by: Marek Szyprowski <[email protected]>
Acked-by: Wolfram Sang <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>

i2c-parport: Make template structure const

parport_algo_data is a template so it can be marked const.

Signed-off-by: Jean Delvare <[email protected]>

i2c-dev: Remove unnecessary casts

The private_data member of struct file is a void *, there is no need
to cast it.

Signed-off-by: H Hartley Sweeten <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>

at24: Fall back to byte or word reads if needed

Increase the portability of the at24 driver by letting it read from
EEPROM chips connected to cheap SMBus controllers that support neither
raw I2C messages nor even I2C block reads. All SMBus controllers
should support either word reads or byte reads, so read support
becomes universal, much like with the legacy "eeprom" driver.

Obviously, this only works with EEPROM chips up to AT24C16, that use
8-bit offset addressing. 16-bit offset addressing is almost impossible
to support on SMBus controllers.

I did not add universal support for writes, as I had no immediate need
for this, but it could be added later if needed (with the same
performance issue as byte and word reads have, of course.)

Signed-off-by: Jean Delvare <[email protected]>
Reviewed-by: Wolfram Sang <[email protected]>
Cc: Konstantin Lazarev <[email protected]>

i2c-stub: Expose the default functionality flags

It is easier to adjust the flags when you know their default value.

Signed-off-by: Jean Delvare <[email protected]>
Cc: Mark M. Hoffman <[email protected]>

i2c/scx200_acb: Make PCI device ids constant

Make PCI device ids constant as we just did for many other i2c bus
drivers already.

Signed-off-by: Jean Delvare <[email protected]>
Cc: Márton Németh <[email protected]>

i2c-i801: Fix all checkpatch warnings

Fix all checkpatch warnings. No functional changes are made.

Signed-off-by: Ivo Manca <[email protected]>
Signed-off-by: Jean Delvare <[email protected]>

i2c-i801: All newer devices have all the optional features

Only the oldest devices lack some of the features supported by this
driver. List them explicitly, and default to all features enabled for
all other chips, including the ones added through sysfs. This will
make future driver maintenance easier.

In the unlikely event of a not yet supported device not implementing
all the features, one can always use the disable_features module
parameter to prevent the driver from attempting to use them.

Signed-off-by: Jean Delvare <[email protected]>
Acked-by: Seth Heasley <[email protected]>

i2c-i801: Let the user disable selected driver features

Let the user disable selected features normally supported by the
device. This makes it possible to work around possible driver or
hardware bugs if the feature in question doesn't work as intended
for whatever reason.

Signed-off-by: Jean Delvare <[email protected]>
Cc: Felix Rubinstein <[email protected]>

net: Expose all network devices in a namespaces in sysfs

This reverts commit aaf8cdc34ddba08122f02217d9d684e2f9f5d575.

Drivers like the ipw2100 call device_create_group when they
are initialized and device_remove_group when they are shutdown.
Moving them between namespaces deletes their sysfs groups early.

In particular the following call chain results.
netdev_unregister_kobject -> device_del -> kobject_del -> sysfs_remove_dir
With sysfs_remove_dir recursively deleting all of it's subdirectories,
and nothing adding them back.

Ouch!

Therefore we need to call something that ultimate calls sysfs_mv_dir
as that sysfs function can move sysfs directories between namespaces
without deleting their subdirectories or their contents. Allowing
us to avoid placing extra boiler plate into every driver that does
something interesting with sysfs.

Currently the function that provides that capability is device_rename.
That is the code works without nasty side effects as originally written.

So remove the misguided fix for moving devices between namespaces. The
bug in the kobject layer that inspired it has now been recognized and
fixed.

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

hotplug: netns aware uevent_helper

It only makes sense for uevent_helper to get events
in the intial namespaces. It's invocation is not
per namespace and it is not clear how we could make
it's invocation namespace aware.

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

kobj: Send hotplug events in the proper namespace.

Utilize netlink_broacast_filtered to allow sending hotplug events
in the proper namespace.

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

netlink: Implment netlink_broadcast_filtered

When netlink sockets are used to convey data that is in a namespace
we need a way to select a subset of the listening sockets to deliver
the packet to. For the network namespace we have been doing this
by only transmitting packets in the correct network namespace.

For data belonging to other namespaces netlink_bradcast_filtered
provides a mechanism that allows us to examine the destination
socket and to decide if we should transmit the specified packet
to it.

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

net/sysfs: Fix the bitrot in network device kobject namespace support

I had a couple of stupid bugs in:
netns: Teach network device kobjects which namespace they are in.

- I duplicated the Kconfig for the NET_NS
- The build was broken when sysfs was not compiled in

The sysfs breakage is because after I moved the operations
for the sysfs to the kobject layer, to make things cleaner
I forgot to move the ifdefs. Opps.

I'm not quite certain how I got introduced a second NET_NS Kconfig,
but it was probably a 3 way merge somewhere along the way that
did not notice that the NET_NS Kconfig option had mvoed and thout
that was a bug. It probably slipped in because it used to be the
sysfs patches were the first patches in my network namespace patches.
Some things just don't go like you would expect.

Neither of these bugs actually affect anything in the common case
but they should be fixed.

Thanks to Serge for noticing they were present.

Reported-by: Serge E. Hallyn <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: David S. Miller <[email protected]>

netns: Teach network device kobjects which namespace they are in.

The problem. Network devices show up in sysfs and with the network
namespace active multiple devices with the same name can show up in
the same directory, ouch!

To avoid that problem and allow existing applications in network namespaces
to see the same interface that is currently presented in sysfs, this
patch enables the tagging directory support in sysfs.

By using the network namespace pointers as tags to separate out the
the sysfs directory entries we ensure that we don't have conflicts
in the directories and applications only see a limited set of
the network devices.

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

kobject: Send hotplug events in all network namespaces

Open a copy of the uevent kernel socket in each network
namespace so we can send uevents in all network namespaces.

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: David S. Miller <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

driver-core: fix Typo in drivers/base/core.c for CONFIG_MODULE

In this code section the final S of CONFIG_MODULES was missed making
the whole check useless

Signed-off-by: Christoph Egger <[email protected]>
Cc: Mark McLoughlin <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

pci: check caps from sysfs file open to read device dependent config space

The PCI config space bin_attr read handler has a hardcoded CAP_SYS_ADMIN
check to verify privileges before allowing a user to read device
dependent config space.  This is meant to protect from an unprivileged
user potentially locking up the box.

When assigning a PCI device directly to a guest with libvirt and KVM,
the sysfs config space file is chown'd to the unprivileged user that
the KVM guest will run as.  The guest needs to have full access to the
device's config space since it's responsible for driving the device.
However, despite being the owner of the sysfs file, the CAP_SYS_ADMIN
check will not allow read access beyond the config header.

With this patch we check privileges against the capabilities used when
openining the sysfs file.  The allows a privileged process to open the
file and hand it to an unprivileged process, and the unprivileged process
can still read all of the config space.

Signed-off-by: Chris Wright <[email protected]>
Acked-by: Jesse Barnes <[email protected]>
Cc: Alan Cox <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs: Implement sysfs tagged directory support.

The problem.  When implementing a network namespace I need to be able
to have multiple network devices with the same name.  Currently this
is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
potentially a few other directories of the form /sys/ ... /net/*.

What this patch does is to add an additional tag field to the
sysfs dirent structure.  For directories that should show different
contents depending on the context such as /sys/class/net/, and
/sys/devices/virtual/net/ this tag field is used to specify the
context in which those directories should be visible.  Effectively
this is the same as creating multiple distinct directories with
the same name but internally to sysfs the result is nicer.

I am calling the concept of a single directory that looks like multiple
directories all at the same path in the filesystem tagged directories.

For the networking namespace the set of directories whose contents I need
to filter with tags can depend on the presence or absence of hotplug
hardware or which modules are currently loaded.  Which means I need
a simple race free way to setup those directories as tagged.

To achieve a reace free design all tagged directories are created
and managed by sysfs itself.

Users of this interface:
- define a type in the sysfs_tag_type enumeration.
- call sysfs_register_ns_types with the type and it's operations
- sysfs_exit_ns when an individual tag is no longer valid

- Implement mount_ns() which returns the ns of the calling process
  so we can attach it to a sysfs superblock.
- Implement ktype.namespace() which returns the ns of a syfs kobject.

Everything else is left up to sysfs and the driver layer.

For the network namespace mount_ns and namespace() are essentially
one line functions, and look to remain that.

Tags are currently represented a const void * pointers as that is
both generic, prevides enough information for equality comparisons,
and is trivial to create for current users, as it is just the
existing namespace pointer.

The work needed in sysfs is more extensive.  At each directory
or symlink creating I need to check if the directory it is being
created in is a tagged directory and if so generate the appropriate
tag to place on the sysfs_dirent.  Likewise at each symlink or
directory removal I need to check if the sysfs directory it is
being removed from is a tagged directory and if so figure out
which tag goes along with the name I am deleting.

Currently only directories which hold kobjects, and
symlinks are supported.  There is not enough information
in the current file attribute interfaces to give us anything
to discriminate on which makes it useless, and there are
no potential users which makes it an uninteresting problem
to solve.

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Benjamin Thery <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs: add struct file* to bin_attr callbacks

This allows bin_attr->read,write,mmap callbacks to check file specific data
(such as inode owner) as part of any privilege validation.

Signed-off-by: Chris Wright <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

kobj: Add basic infrastructure for dealing with namespaces.

Move complete knowledge of namespaces into the kobject layer
so we can use that information when reporting kobjects to
userspace.

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs: Remove usage of S_BIAS to avoid merge conflict with the vfs tree

In Al's latest vfs tree the code is reworked and S_BIAS has been removed.

It turns out that checking to see if a super block is in the
middle of an unmount in sysfs_exit_ns is unnecessary because we
remove the super_block from the s_supers/s_instances list before
struct sysfs_super_info pointed to by sb->s_fs_info is freed.

For now just delete the unnecessary check to see if a superblock is in the
middle of an unmount, it isn't necessary with or without Al's changes
and it just causes a needless conflict.

Reported-by: Stephen Rothwell <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs: Remove double free sysfs_get_sb

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs: Don't use enums in inline function declaration.

It appears gcc can't cope with using an enum that is only declared in
an inline function declaration, that doesn't even use the variable
that is so declared.

Avoid the silliness and replace the enum with an int, and make gcc
happy.

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs-namespaces: add a high-level Documentation file

The first three paragraphs are almost verbatim taken from Eric's
commit message on the patch introducing network ns tags.  The next
two paragraphs I wrote to be a brief high level overview.  The last
section is taken from the commit message on "Implement sysfs tagged
directory support", but updated.  Hopefully correctly.

Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs: Comment sysfs directory tagging logic

Add some in-line comments to explain the new infrastructure, which
was introduced to support sysfs directory tagging with namespaces.
I think an overall description someplace might be good too, but it
didn't really seem to fit into Documentation/filesystems/sysfs.txt,
which appears more geared toward users, rather than maintainers, of
sysfs.

(Tejun, please let me know if I can make anything clearer or failed
altogether to comment something that should be commented.)

Signed-off-by: Serge E. Hallyn <[email protected]>
Cc: Eric W. Biederman <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

driver core: Implement ns directory support for device classes.

device_del and device_rename were modified to use
sysfs_delete_link and sysfs_rename_link respectively to ensure
when these operations happen on devices whose classes
are in namespace directories they work properly.

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Benjamin Thery <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs: Implement sysfs_delete_link

When removing a symlink sysfs_remove_link does not provide
enough information to figure out which tagged directory the symlink
falls in.  So I need sysfs_delete_link which is passed the target
of the symlink to delete.

sysfs_rename_link is updated to call sysfs_delete_link instead
of sysfs_remove_link as we have all of the information necessary
and the callers are interesting.

Both of these functions now have enough information to find a symlink
in a tagged directory.  The only restriction is that they must be called
before the target kobject is renamed or deleted.  If they are called
later I loose track of which tag the target kobject was marked with
and can no longer find the old symlink to remove it.

This patch was split from an earlier patch.

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Benjamin Thery <[email protected]>
Signed-off-by: Daniel Lezcano <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs: Add support for tagged directories with untagged members.

I had hopped to avoid this but the bonding driver adds a file
to /sys/class/net/ and the easiest way to handle that file is
to make it untagged and to register it only once.

So relax the rules on tagged directories, and make bonding work.

Signed-off-by: Eric W. Biederman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

sysfs: Basic support for multiple super blocks

Add all of the necessary bioler plate to support
multiple superblocks in sysfs.

Signed-off-by: Eric W. Biederman <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

generate "change" uevent for loop device

Recent udev versions probe loop devices for filesystems meaning that
the /dev/disk hierarchy may contain useful entries such as

$ ls -l /dev/disk/by-label/Fedora-12-x86_64-Live
lrwxrwxrwx 1 root root 11 Mar 11 13:41 /dev/disk/by-label/Fedora-12-x86_64-Live -> ../../loop0

Unfortunately, no "change" uevent is generated when the loop device is
detached so the symlink persists. Additionally, no "change" uevent is
guaranteed to be generated when attaching an fd or changing capacity.
For example, user space could open the loop device O_RDONLY (in fact,
recent util-linux-ng does this) so udev's OPTIONS+="watch" machinery may
not trigger the "change" uevent.

This patch ensures that the "change" uevent is generated in all of
these cases. As a result, the /dev/disk hierarchy works as expected
for loop devices.

Signed-off-by: David Zeuthen <[email protected]>
Acked-by: Kay Sievers <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Driver core: Protect device shutdown from hot unplug events.

While device_shutdown() walks through devices_kset to shutdown all
devices, device unplug events may race to shutdown individual devices.
Specifically, sd_shutdown(), on behalf of fc_starget_delete(), has
been observed deleting devices during device_shutdown()'s list
traversal. So we factor out list_for_each_entry_safe_reverse(...) in
favor of while (!list_empty(...)).

Signed-off-by: Hugh Daschbach <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

firmware loader: do not allocate firmare id separately

fw_id has the same life time as firmware_priv so it makes sense to move
it into firmware_priv structure instead of allocating separately.

Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

firmware loader: split out builtin firmware handling

Split builtin firmware handling into separate functions to clean up the
main body of code.

Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

firmware loader: rely on driver core to create class attribute

Do not create 'timeout' attribute manually, let driver core do it for us.
This also ensures that attribute is cleaned up properly.

Signed-off-by: Dmitry Torokhov <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

firmware class: export nowait to userspace

When we use request_firmware_nowait(), userspace may
not want to answer negatively right away when for
example it is answering from an initrd only, but
with request_firmware() it has to in order to not
delay the kernel boot until the request times out.

This allows userspace to differentiate between the
two in order to be able to reply negatively to async
requests only when all filesystems have been mounted
and have been checked for the requested firmware file.

Signed-off-by: Johannes Berg <[email protected]>
Cc: Kay Sievers <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

lockdep: Add novalidate class for dev->mutex conversion

The conversion of device->sem to device->mutex resulted in lockdep
warnings. Create a novalidate class for now until the driver folks
come up with separate classes. That way we have at least the basic
mutex debugging coverage.

Add a checkpatch error so the usage is reserved for device->mutex.

[ tglx: checkpatch and compile fix for LOCKDEP=n ]

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

drivers/base: Convert dev->sem to mutex

The semaphore is semantically a mutex. Convert it to a real mutex and
fix up a few places where code was relying on semaphore.h to be included
by device.h, as well as the users of the trylock function, as that value
is now reversed.

Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

platform_bus: allow custom extensions to system PM methods

When runtime PM for platform_bus was added, it allowed for platforms
to customize the runtime PM methods since they are defined as weak
symbols.

This patch allows platforms to also extend the system PM methods with
custom hooks so runtime PM and system PM extensions can be managed
together by custom platform-specific code.

Signed-off-by: Kevin Hilman <[email protected]>
Cc: Magnus Damm <[email protected]>
Cc: Rafael Wysocki <[email protected]>
Cc: Dmitry Torokhov <[email protected]>
Cc: Eric Miao <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

devtmpfs: support !CONFIG_TMPFS

Make devtmpfs available on (embedded) configurations without SHMEM/TMPFS,
using ramfs instead.

Saves ~15KB.

Signed-off-by: Peter Korsgaard <[email protected]>
Acked-by: Kay Sievers <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

driver core: module.c: Use kasprintf

kasprintf combines kmalloc and sprintf, and takes care of the size
calculation itself.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression a,flag;
expression list args;
statement S;
@@

  a =
-  $kmalloc\|kzalloc$(...,flag)
+  kasprintf(flag,args)
  <... when != a
  if (a == NULL || ...) S
  ...>
- sprintf(a,args);
// </smpl>

Signed-off-by: Julia Lawall <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Driver core: don't initialize wakeup flags

This patch (as1351) removes an unnecessary and unwanted assignment
from device_initialize(). The wakeup flags are set to 0 along with
everything else when the device structure is allocated, so we don't
need to do it again. Furthermore, the subsystem might already have
set these flags to their correct values; we don't want to override it.

Signed-off-by: Alan Stern <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

driver-core: fix potential race condition in drivers/base/dd.c

This patch fix a potential race condition in the driver_bound() function
in the file driver/base/dd.c.

The broadcast of the BUS_NOTIFY_BOUND_DRIVER notifier should be done
after adding the new device to the driver list. Otherwise notifier
listener will fail if they use functions like usb_find_interface().

The patch is against kernel 2.6.33. Please merge it.

Signed-off-by: Stefani Seibold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Driver core: Reduce the level of request_firmware() messages

The messages from _request_firmware() informing that firmware is
being requested or built-in firmware is going to be used are printed
at KERN_INFO, which produces lots of noise on systems with huge
numbers of AMD CPUs. Reduce the level of these messages to
KERN_DEBUG to get rid of that noise.

Signed-off-by: Rafael J. Wysocki <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

kref: remove kref_set

Of the three uses of kref_set in the kernel:

One really should be kref_put as the code is letting go of a
reference,
Two really should be kref_init because the kref is being
initialised.

This suggests that making kref_set available encourages bad code.
So fix the three uses and remove kref_set completely.

Signed-off-by: NeilBrown <[email protected]>
Acked-by: Mimi Zohar <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

firmware_class: fix memory leak - free allocated pages

fix memory leak introduced by the patch 6e03a201bbe:
firmware: speed up request_firmware()

1. vfree won't release pages there were allocated explicitly and mapped
using vmap. The memory has to be vunmap-ed and the pages needs
to be freed explicitly

2. page array is moved into the 'struct
firmware' so that we can free it from release_firmware()
and not only in fw_dev_release()

The fix doesn't break the firmware load speed.

Cc: Johannes Berg <[email protected]>
Cc: Ming Lei <[email protected]>
Cc: Catalin Marinas <[email protected]>
Singed-off-by: Kay Sievers <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
Signed-off-by: Tomas Winkler <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

drivers/base/cpu.c: fix the output from /sys/devices/system/cpu/offline

Without CONFIG_CPUMASK_OFFSTACK, simply inverting cpu_online_mask leads
to CPUs beyond nr_cpu_ids to be displayed twice and CPUs not even
possible to be displayed as offline.

Signed-off-by: Jan Beulich <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Cc: stable <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: Tidy REMOTE_DEBUG

REMOTE_DEBUG does already appear in 2.2 kernel sources but didn't
appear as a config Option in the initial git import 2.6.12-rc. It's
currently just used in one single place of the linux kernel and should
probably be dropped totally

Signed-off-by: Christoph Egger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: isicomm: handle running out of slots

This patch makes it return -ENODEV if we run out of empty slots in the
probe function. It's unlikely to happen, but it makes the static
checkers happy.

Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: Use resource size to fix off-by-one error

Use the resource_size function instead of manually calculating the
resource size. This actually fixes an off-by-one error.

Signed-off-by: Tobias Klauser <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: fix obsolete comment on tty_insert_flip_string_fixed_flag

Comment was not updated when tty_insert_flip_string was generalised.

Signed-off-by: Johan Hovold <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: Add driver for the Altera UART

Add an UART driver for the UART component available as a SOPC (System on
Programmable Chip) component for Altera FPGAs.

Signed-off-by: Tobias Klauser <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: Add driver for the Altera JTAG UART

Add an UART driver for the JTAG UART component available as a SOPC
(System on Programmable Chip) component for Altera FPGAs.

Signed-off-by: Tobias Klauser <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: timbuart: make sure last byte is sent when port is closed

Fix a problem in early versions of the FPGA IP.

In certain situations the IP reports that the FIFO is empty, but a byte is
still clocked out.  If a flush is done at that point the currently clocked
byte is canceled.

This causes incompatibilities with the upper layers when a port is closed,
it waits until the FIFO is empty and then closes the port.  During close
the FIFO is flushed -> the last byte is not sent properly.

Now the FIFO is only flushed if it is reported to be non-empty.  Which
makes the currently clocked out byte to finish.

[[email protected]: fix build]
Signed-off-by: Richard Röjfors <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: two branches the same in timbuart_set_mctrl()

CTS is a read only bit and we are to stop signal RTS if modem line
TIOCM_RTS is not set.

Thanks for suggestions by Richard Röjfors.

Signed-off-by: Roel Kluin <[email protected]>
Acked-by: Richard Röjfors <[email protected]>
Cc: Alan Cox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: uartlite: move from byte accesses to word accesses

Byte accesses for I/O devices in Xilinx IP is going to be less
desired in the future such that the driver is being changed to
use 32 bit accesses.

This change facilitates using the uartlite IP over a PCIe bus
which only allows 32 bit accesses.

Signed-off-by: John Linn <[email protected]>
Tested-by: Michal Simek <[email protected]>
Acked-by: Grant Likely <[email protected]>
Acked-by: Peter Korsgaard <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: n_gsm: depends on NET

n_gsm uses skb functions, so it should depend on NET.

n_gsm.c:(.text+0x123d49): undefined reference to `skb_dequeue'
n_gsm.c:(.text+0x123d98): undefined reference to `kfree_skb'
n_gsm.c:(.text+0x123e1e): undefined reference to `skb_pull'

Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

tty: n_gsm line discipline

Add an implementation of GSM 0710 MUX. The implementation currently supports

- Basic and advanced framing (as either end of the link)
- UI or UIH data frames
- Adaption layer 1-4 (1 and 2 via tty, 3 and 4 as skbuff lists)
- Modem and control messages including the correct retry process
- Flow control

and exposes the MUX channels as a set of virtual tty devices including modem
signals. This is an experimental driver.

Signed-off-by: Alan Cox <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: TTY: new ldiscs for staging

Push the max ldiscs by a few number to allow ldiscs
to exist in the staging directory and elsewhere.

Signed-off-by: Pavan Savoy <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: drop redundant cpu depends

The BF54xM procs imply the related BF54x define, so no need to check both.

Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: drop the experimental markings

Should be stable now ...

Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: pull in bfin_sport.h for SPORT defines

Now that the SPORT MMR defines have been unified, switch over to it.

Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: only enable SPORT TX if data is to be sent

Rather than always turn on the SPORT TX interrupt, only do it when we've
actually queued up data for transmission. This avoids useless interrupt
processing.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: drop useless status masks

These were all copied over from the Blackfin UART driver, but they don't
make sense here because these bits are all specific to the Blackfin UART.

Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: zero sport_uart_port if allocated dynamically

Need to initialize the SPORT state rather than using random memory.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: protect changes to uart_port

Common serial API says we need to grab the port lock before modifying
the port state to prevent inconsistent state between threads.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: add support for CTS/RTS via GPIOs

Some people need flow control on their ports, so now boards can support
that via any GPIOs.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: rename early platform driver class string

Clarifies command line set up for devices between consoles and early
devices.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: add missing mapbase initialization

The driver doesn't care about this, but the common serial core wants it.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: remove unused peripheral pin lists

All the resources are in the boards files now.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: shorten the SPORT TX waiting loop

The waiting loop to stop SPORT TX from TX interrupt is too long. This may
block the SPORT RX interrupts and cause the RX FIFO to overflow. So, do
stop sport TX only after the last char in TX FIFO is moved into the shift
register.

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

serial: bfin_sport_uart: work around anomaly 05000473 (make 32bit fifo read atomic)

We cannot let a 32-bit RX FIFO read be interrupted otherwise a fake RX
underflow error might be generated.

URL: http://blackfin.uclinux.org/gf/tracker/5145

Signed-off-by: Sonic Zhang <[email protected]>
Signed-off-by: Mike Frysinger <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (23 commits)
  nilfs2: disallow remount of snapshot from/to a regular mount
  nilfs2: use huge_encode_dev/huge_decode_dev
  nilfs2: update comment on deactivate_super at nilfs_get_sb
  nilfs2: replace MS_VERBOSE with MS_SILENT
  nilfs2: add missing initialization of s_mode
  nilfs2: fix misuse of open_bdev_exclusive/close_bdev_exclusive
  nilfs2: enlarge s_volume_name member in nilfs_super_block
  nilfs2: use checkpoint number instead of timestamp to select super block
  nilfs2: add missing endian conversion on super block magic number
  nilfs2: make nilfs_sc_*_ops static
  nilfs2: add kernel doc comments to persistent object allocator functions
  nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info
  nilfs2: remove nilfs_segctor_init() in segment.c
  nilfs2: insert checkpoint number in segment summary header
  nilfs2: add a print message after loading nilfs2
  nilfs2: cleanup multi kmem_cache_{create,destroy} code
  nilfs2: move out checksum routines to segment buffer code
  nilfs2: move pointer to super root block into logs
  nilfs2: change default of 'errors' mount option to 'remount-ro' mode
  nilfs2: Combine nilfs_btree_release_path() and nilfs_btree_free_path()
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
  GFS2: Fix typo
  GFS2: stuck in inode wait, no glocks stuck
  GFS2: Eliminate useless err variable
  GFS2: Fix writing to non-page aligned gfs2_quota structures
  GFS2: Add some useful messages
  GFS2: fix quota state reporting
  GFS2: Various gfs2_logd improvements
  GFS2: glock livelock
  GFS2: Clean up stuffed file copying
  GFS2: docs update
  GFS2: Remove space from slab cache name

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
dlm: fix ast ordering for user locks
dlm: cleanup remove unused code

Merge git://git.infradead.org/mtd-2.6

* git://git.infradead.org/mtd-2.6: (154 commits)
  mtd: cfi_cmdset_0002: use AMD standard command-set with Winbond flash chips
  mtd: cfi_cmdset_0002: Fix MODULE_ALIAS and linkage for new 0701 commandset ID
  mtd: mxc_nand: Remove duplicate NAND_CMD_RESET case value
  mtd: update gfp/slab.h includes
  jffs2: Stop triggering block erases from jffs2_write_super()
  jffs2: Rename jffs2_erase_pending_trigger() to jffs2_dirty_trigger()
  jffs2: Use jffs2_garbage_collect_trigger() to trigger pending erases
  jffs2: Require jffs2_garbage_collect_trigger() to be called with lock held
  jffs2: Wake GC thread when there are blocks to be erased
  jffs2: Erase pending blocks in GC pass, avoid invalid -EIO return
  jffs2: Add 'work_done' return value from jffs2_erase_pending_blocks()
  mtd: mtdchar: Do not corrupt backing device of device node inode
  mtd/maps/pcmciamtd: Fix printk format for ssize_t in debug messages
  drivers/mtd: Use kmemdup
  mtd: cfi_cmdset_0002: Fix argument order in bootloc warning
  mtd: nand: add Toshiba TC58NVG0 device ID
  pcmciamtd: add another ID
  pcmciamtd: coding style cleanups
  pcmciamtd: fixing obvious errors
  mtd: chips: add SST39WF160x NOR-flashes
  ...

Trivial conflicts due to dev_node removal in drivers/mtd/maps/pcmciamtd.c

Merge branch 'linux-next' of git://git.infradead.org/ubi-2.6

* 'linux-next' of git://git.infradead.org/ubi-2.6:
  UBI: misc comment fixes
  UBI: fix s/then/than/ typos
  UBI: init even if MTD device cannot be attached, if built into kernel
  UBI: remove reboot notifier

Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6

* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: mark VFS SB RO too

Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs

* 'for-linus' of git://oss.sgi.com/xfs/xfs: (54 commits)
  xfs: mark xfs_iomap_write_ helpers static
  xfs: clean up end index calculation in xfs_page_state_convert
  xfs: clean up mapping size calculation in __xfs_get_blocks
  xfs: clean up xfs_iomap_valid
  xfs: move I/O type flags into xfs_aops.c
  xfs: kill struct xfs_iomap
  xfs: report iomap_bn in block base
  xfs: report iomap_offset and iomap_bsize in block base
  xfs: remove iomap_delta
  xfs: remove iomap_target
  xfs: limit xfs_imap_to_bmap to a single mapping
  xfs: simplify buffer to transaction matching
  xfs: Make fiemap work in query mode.
  xfs: kill off l_sectbb_mask
  xfs: record log sector size rather than log2(that)
  xfs: remove dead XFS_LOUD_RECOVERY code
  xfs: removed unused XFS_QMOPT_ flags
  xfs: remove a few macro indirections in the quota code
  xfs: access quotainfo structure directly
  xfs: wait for direct I/O to complete in fsync and write_inode
  ...

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (47 commits)
  ocfs2: Silence a gcc warning.
  ocfs2: Don't retry xattr set in case value extension fails.
  ocfs2:dlm: avoid dlm->ast_lock lockres->spinlock dependency break
  ocfs2: Reset xattr value size after xa_cleanup_value_truncate().
  fs/ocfs2/dlm: Use kstrdup
  fs/ocfs2/dlm: Drop memory allocation cast
  Ocfs2: Optimize punching-hole code.
  Ocfs2: Make ocfs2_find_cpos_for_left_leaf() public.
  Ocfs2: Fix hole punching to correctly do CoW during cluster zeroing.
  Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead.
  ocfs2: Block signals for mkdir/link/symlink/O_CREAT.
  ocfs2: Wrap signal blocking in void functions.
  ocfs2/dlm: Increase o2dlm lockres hash size
  ocfs2: Make ocfs2_extend_trans() really extend.
  ocfs2/trivial: Code cleanup for allocation reservation.
  ocfs2: make ocfs2_adjust_resv_from_alloc simple.
  ocfs2: Make nointr a default mount option
  ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICE
  o2net: log socket state changes
  ocfs2: print node # when tcp fails
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6

* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (182 commits)
  [SCSI] aacraid: add an ifdef'd device delete case instead of taking the device offline
  [SCSI] aacraid: prohibit access to array container space
  [SCSI] aacraid: add support for handling ATA pass-through commands.
  [SCSI] aacraid: expose physical devices for models with newer firmware
  [SCSI] aacraid: respond automatically to volumes added by config tool
  [SCSI] fcoe: fix fcoe module ref counting
  [SCSI] libfcoe: FIP Keep-Alive messages for VPorts are sent with incorrect port_id and wwn
  [SCSI] libfcoe: Fix incorrect MAC address clearing
  [SCSI] fcoe: fix a circular locking issue with rtnl and sysfs mutex
  [SCSI] libfc: Move the port_id into lport
  [SCSI] fcoe: move link speed checking into its own routine
  [SCSI] libfc: Remove extra pointer check
  [SCSI] libfc: Remove unused fc_get_host_port_type
  [SCSI] fcoe: fixes wrong error exit in fcoe_create
  [SCSI] libfc: set seq_id for incoming sequence
  [SCSI] qla2xxx: Updates to ISP82xx support.
  [SCSI] qla2xxx: Optionally disable target reset.
  [SCSI] qla2xxx: ensure flash operation and host reset via sg_reset are mutually exclusive
  [SCSI] qla2xxx: Silence bogus warning by gcc for wrap and did.
  [SCSI] qla2xxx: T10 DIF support added.
  ...

HID: fix up 'EMBEDDED' mess in Kconfig

The whole point of making some of the drivers automatically selected
unless 'EMBEDDED' was to handle quirks transparently after their separation
from the generic core.

Over time, some of the later-added quirks grew into more standalone drivers,
implementing non-trivial features a being larger than a few bytes of code.

In addition to that, some of the standalone drivers don't make sense for
99.9% of the users, as they are very specific to rare devices.

Therefore build by default in only those drivers which

- we historically used to support even before quirk separation from the
core code
- are isolated enough and likely to hit quite large portion of the
users anyway (Microsoft, Logitech)

Reported-by: Linus Torvalds <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>

powerpc: Remove unused 'protect4gb' boot parameter

'protect4gb' boot parameter was introduced to avoid allocating dma
space acrossing 4GB boundary in 2007 (the commit
569975591c5530fdc9c7a3c45122e5e46f075a74).

In 2008, the IOMMU was fixed to use the boundary_mask parameter per
device properly. So 'protect4gb' workaround was removed (the
383af9525bb27f927511874f6306247ec13f1c28). But somehow I messed the
'protect4gb' boot parameter that was used to enable the
workaround.

Signed-off-by: FUJITA Tomonori <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

powerpc: Build-in e1000e for pseries & ppc64_defconfig

The e1000e device is becoming more common these days, so let's just
build it in for pseries & ppc64_defconfig.

Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

powerpc/pseries: Make request_ras_irqs() available to other pseries code

At the moment only the RAS code uses event-sources interrupts (for EPOW
events and internal errors) so request_ras_irqs() (which actually requests
the event-sources interrupts) is found in ras.c and is static.

We want to be able to use event-sources interrupts in other pseries code,
so let's rename request_ras_irqs() to request_event_sources_irqs() and
move it to event_sources.c.

This will be used in an upcoming patch that adds support for IO Event
interrupts that come through as event sources.

Signed-off-by: Mark Nelson <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

powerpc/numa: Use ibm,architecture-vec-5 to detect form 1 affinity

I've been told that the architected way to determine we are in form 1
affinity mode is by reading the ibm,architecture-vec-5 property which
mirrors the layout of the fifth vector of the ibm,client-architecture
structure.

Eventually we may want to parse the ibm,architecture-vec-5 and create
FW_FEATURE_* bits.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim

I noticed /proc/sys/vm/zone_reclaim_mode was 0 on a ppc64 NUMA box. It gets
enabled via this:

        /*
         * If another node is sufficiently far away then it is better
         * to reclaim pages in a zone before going off node.
         */
        if (distance > RECLAIM_DISTANCE)
                zone_reclaim_mode = 1;

Since we use the default value of 20 for REMOTE_DISTANCE and 20 for
RECLAIM_DISTANCE it never kicks in.

The local to remote bandwidth ratios can be quite large on System p
machines so it makes sense for us to reclaim clean pagecache locally before
going off node.

The patch below sets a smaller value for RECLAIM_DISTANCE and thus enables
zone reclaim.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

powerpc: Use smt_snooze_delay=-1 to always busy loop

Right now if we want to busy loop and not give up any time to the hypervisor
we put a very large value into smt_snooze_delay. This is sometimes useful
when running a single partition and you want to avoid any latencies due
to the hypervisor or CPU power state transitions. While this works, it's a bit
ugly - how big a number is enough now we have NO_HZ and can be idle for a very
long time.

The patch below makes smt_snooze_delay signed, and a negative value means loop
forever:

echo -1 > /sys/devices/system/cpu/cpu0/smt_snooze_delay

This change shouldn't affect the existing userspace tools (eg ppc64_cpu), but
I'm cc-ing Nathan just to be sure.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

powerpc: Remove check of ibm,smt-snooze-delay OF property

I'm not sure why we have code for parsing an ibm,smt-snooze-delay OF
property. Since we have a smt-snooze-delay= boot option and we can
also set it at runtime via sysfs, it should be safe to get rid of
this code.

Signed-off-by: Anton Blanchard <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

powerpc/kdump: Fix race in kdump shutdown

When we are crashing, the crashing/primary CPU IPIs the secondaries to
turn off IRQs, go into real mode and wait in kexec_wait.  While this
is happening, the primary tears down all the MMU maps.  Unfortunately
the primary doesn't check to make sure the secondaries have entered
real mode before doing this.

On PHYP machines, the secondaries can take a long time shutting down
the IRQ controller as RTAS calls are need.  These RTAS calls need to
be serialised which resilts in the secondaries contending in
lock_rtas() and hence taking a long time to shut down.

We've hit this on large POWER7 machines, where some secondaries are
still waiting in lock_rtas(), when the primary tears down the HPTEs.

This patch makes sure all secondaries are in real mode before the
primary tears down the MMU.  It uses the new kexec_state entry in the
paca.  It times out if the secondaries don't reach real mode after
10sec.

Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>

powerpc/kexec: Fix race in kexec shutdown

In kexec_prepare_cpus, the primary CPU IPIs the secondary CPUs to
kexec_smp_down().  kexec_smp_down() calls kexec_smp_wait() which sets
the hw_cpu_id() to -1.  The primary does this while leaving IRQs on
which means the primary can take a timer interrupt which can lead to
the IPIing one of the secondary CPUs (say, for a scheduler re-balance)
but since the secondary CPU now has a hw_cpu_id = -1, we IPI CPU
-1... Kaboom!

We are hitting this case regularly on POWER7 machines.

There is also a second race, where the primary will tear down the MMU
mappings before knowing the secondaries have entered real mode.

Also, the secondaries are clearing out any pending IPIs before
guaranteeing that no more will be received.

This changes kexec_prepare_cpus() so that we turn off IRQs in the
primary CPU much earlier.  It adds a paca flag to say that the
secondaries have entered the kexec_smp_down() IPI and turned off IRQs,
rather than overloading hw_cpu_id with -1.  This new paca flag is
again used to in indicate when the secondaries has entered real mode.

It also ensures that all CPUs have their IRQs off before we clear out
any pending IPI requests (in kexec_cpu_down()) to ensure there are no
trailing IPIs left unacknowledged.

Signed-off-by: Michael Neuling <[email protected]>
Signed-off-by: Benjamin Herrenschmidt <[email protected]>