Wu Zhangjin [Wed, 27 Oct 2010 10:59:08 +0000 (18:59 +0800)]
ftrace/MIPS: Add module support for C version of recordmcount
Since MIPS modules' address space differs from the core kernel space, to access
the _mcount in the core kernel, the kernel functions in modules must use long
call (-mlong-calls): load the _mcount address into one register and jump to the
address stored by the register:
In the old Perl version of recordmcount, we only need to record the position of
the 1st R_MIPS_HI16 type of _mcount, and later, in ftrace_make_nop(), replace
the instruction in this position by a "b label" and in ftrace_make_call(),
replace it back.
But, the default C version of recordmcount records all of the _mcount symbols,
so, we must filter the 2nd _mcount like the Perl version of recordmcount does.
The C version of recordmcount copes with the symbols before they are linked, So
It doesn't know the type of the symbols and therefore can not filter the
symbols as the Perl version of recordmcount does. But as we can see above, the
2nd _mcount symbols of the long call alawys follows the 1st _mcount symbol of
the same long call, which means the offset from the 1st to the 2nd is fixed, it
is 0x10-0xc = 4 here, 4 is the length of the 1st load instruciton, for MIPS has
fixed length of instructions, this offset is always 4.
And as we know, the _mcount is inserted into the entry of every kernel
function, the offset between the other _mcount's is expected to be always
bigger than 4. So, to filter the 2ns _mcount symbol of the long call, we can
simply check the offset between two _mcount symbols, If it is 4, then, filter
the 2nd _mcount symbol.
To avoid touching too much code, an 'empty' function fn_is_fake_mcount() is
added for all of the archs, and the specific archs can override it via chaning
the function pointer: is_fake_mcount in do_file() with the e_machine. e.g. This
patch adds MIPS_is_fake_mcount() to override the default fn_is_fake_mcount()
pointed by is_fake_mcount.
This fn_is_fake_mcount() checks if the _mcount symbol is fake, e.g. the 2nd
_mcount symbol of the long call is fake, for there are 2 _mcount symbols mapped
to one real mcount call, so, one of them is fake and must be filtered.
This fn_is_fake_mcount() is called in sift_rel_mcount() after finding the
_mcount symbols and before adding the _mcount symbol into mrelp, so, it can
prevent the fake mcount symbol going into the last __mcount_loc table.
John Reiser [Wed, 27 Oct 2010 10:59:07 +0000 (18:59 +0800)]
ftrace/MIPS: Add MIPS64 support for C version of recordmcount
MIPS64 has 'weird' Elf64_Rel.r_info[1,2], which must be used instead of
the generic Elf64_Rel.r_info, otherwise, the C version of recordmcount
will not work for "segmentation fault".
Usage of "union mips_r_info" and the functions MIPS64_r_sym() and
MIPS64_r_info() written by Maciej W. Rozycki <[email protected]>
David Daney [Thu, 14 Oct 2010 18:32:33 +0000 (11:32 -0700)]
MIPS: Make TASK_SIZE reflect proper size for both 32 and 64 bit processes.
The TASK_SIZE macro should reflect the size of a user process virtual
address space. Previously for 64-bit kernels, this was not the case.
The immediate cause of pain was in
hugetlbfs/inode.c:hugetlb_get_unmapped_area() where 32-bit processes
trying to mmap a huge page would be served a page with an address
outside of the 32-bit address range. But there are other uses of
TASK_SIZE in the kernel as well that would like an accurate value.
The new definition is nice because it now makes TASK_SIZE and
TASK_SIZE_OF() yield the same value for any given process.
For 32-bit kernels there should be no change, although I did factor
out some code in asm/processor.h that became identical for the 32-bit and
64-bit cases.
__UA_LIMIT is now set to ~((1 << SEGBITS) - 1) for 64-bit kernels.
This should eliminate the possibility of getting a
AddressErrorException in the kernel for addresses that pass the
access_ok() test.
With the patch applied, I can still run o32, n32 and n64 processes,
and have an o32 shell fork/exec both n32 and n64 processes.
Kevin Cernekee [Sat, 16 Oct 2010 21:22:38 +0000 (14:22 -0700)]
MIPS: Allow UserLocal on MIPS_R1 processors
Some MIPS32R1 processors implement UserLocal (RDHWR $29) to accelerate
programs that make extensive use of thread-local storage. Therefore,
setting up the HWRENA register should not depend on cpu_has_mips_r2.
Kevin Cernekee [Thu, 21 Oct 2010 03:05:42 +0000 (20:05 -0700)]
MIPS: Honor L2 bypass bit
On many of the newer MIPS32 cores, CP0 CONFIG2 bit 12 (L2B) indicates
that the L2 cache is disabled and therefore Linux should not attempt
to use it.
[Ralf: Moved the code added by Kevin's original patch into a separate
function that can easily be replaced for platforms that need more a
different probe.]
Kevin Cernekee [Sat, 16 Oct 2010 21:22:30 +0000 (14:22 -0700)]
MIPS: Decouple BMIPS CPU support from bcm47xx/bcm63xx SoC code
BMIPS processor cores are used in 50+ different chipsets spread across
5+ product lines. In many cases the chipsets do not share the same
peripheral register layouts, the same register blocks, the same
interrupt controllers, the same memory maps, or much of anything else.
But, across radically different SoCs that share nothing more than the
same BMIPS CPU, a few things are still mostly constant:
SMP operations
Access to performance counters
DMA cache coherency quirks
Cache and memory bus configuration
So, it makes sense to treat each BMIPS processor type as a generic
"building block," rather than tying it to a specific SoC. This makes it
easier to support a large number of BMIPS-based chipsets without
unnecessary duplication of code, and provides the infrastructure needed
to support BMIPS-proprietary features.
Deng-Cheng Zhu [Tue, 12 Oct 2010 11:37:24 +0000 (19:37 +0800)]
MIPS: Add support for hardware performance events (mipsxx)
This patch adds the mipsxx Perf-events support based on the skeleton.
Generic hardware events and cache events are now fully implemented for
the 24K/34K/74K/1004K cores. To support other cores in mipsxx (such as
R10000/SB1), the generic hardware event tables and cache event tables
need to be filled out. To support other CPUs which have different PMU
than mipsxx, such as RM9000 and LOONGSON2, the additional files
perf_event_$cpu.c need to be created.
Raw event is an important part of Perf-events. It helps the user collect
performance data for events that are not listed as the generic hardware
events and cache events but ARE supported by the CPU's PMU.
This patch also adds this feature for mipsxx 24K/34K/74K/1004K. For how to
use it, please refer to processor core software user's manual and the
comments for mipsxx_pmu_map_raw_event() for more details.
Please note that this is a "precise" implementation, which means the
kernel will check whether the requested raw events are supported by this
CPU and which hardware counters can be assigned for them.
To test the functionality of Perf-event, you may want to compile the tool
"perf" for your MIPS platform. You can refer to the following URL:
http://www.linux-mips.org/archives/linux-mips/2010-10/msg00126.html
You also need to customize the CFLAGS and LDFLAGS in tools/perf/Makefile
for your libs, includes, etc.
In case you encounter the boot failure in SMVP kernel on multi-threading
CPUs, you may take a look at:
http://www.linux-mips.org/git?p=linux-mti.git;a=commitdiff;h=5460815027d802697b879644c74f0e8365254020
Deng-Cheng Zhu [Tue, 12 Oct 2010 11:37:23 +0000 (19:37 +0800)]
MIPS: Perf-events: Add callchain support
Adds callchain support for MIPS Perf-events. For more info on this feature,
please refer to tools/perf/Documentation/perf-report.txt and
tools/perf/design.txt.
Currently userspace callchain data is not recorded, because we do not have
a safe way to do this.
Deng-Cheng Zhu [Tue, 12 Oct 2010 11:37:22 +0000 (19:37 +0800)]
MIPS: add support for hardware performance events (skeleton)
This patch provides the skeleton of the HW perf event support. To enable
this feature, we can not choose the SMTC kernel; Oprofile should be
disabled; kernel performance events be selected. Then we can enable it in
Kernel type menu.
Oprofile for MIPS platforms initializes irq at arch init time. Currently
we do not change this logic to allow PMU reservation.
If a platform has EIC, we can use the irq base and perf counter irq offset
defines for the interrupt controller in specific init_hw_perf_events().
Based on this skeleton patch, the 3 different kinds of MIPS PMU, namely,
mipsxx/loongson2/rm9000, can be supported by adding corresponding lower
level C files at the bottom. The suggested names of these files are
perf_event_mipsxx.c/perf_event_loongson2.c/perf_event_rm9000.c. So, for
example, we can do this by adding "#include perf_event_mipsxx.c" at the
bottom of perf_event.c.
In addition, PMUs with 64bit counters are also considered in this patch.
Deng-Cheng Zhu [Tue, 12 Oct 2010 11:37:21 +0000 (19:37 +0800)]
MIPS: add support for software performance events
Software events are required as part of the measurable stuff by the
Linux performance counter subsystem. Here is the list of events added by
this patch:
PERF_COUNT_SW_PAGE_FAULTS
PERF_COUNT_SW_PAGE_FAULTS_MIN
PERF_COUNT_SW_PAGE_FAULTS_MAJ
PERF_COUNT_SW_ALIGNMENT_FAULTS
PERF_COUNT_SW_EMULATION_FAULTS
Deng-Cheng Zhu [Tue, 12 Oct 2010 11:37:20 +0000 (19:37 +0800)]
MIPS: define local_xchg from xchg_local to atomic_long_xchg
Perf-events is now using local_t helper functions internally. There is a
use of local_xchg(). On MIPS, this is defined to xchg_local() which is
missing in asm/system.h. This patch re-defines local_xchg() in asm/local.h
to atomic_long_xchg(). Then Perf-events can pass the build.
Florian Fainelli [Sun, 29 Aug 2010 15:08:44 +0000 (17:08 +0200)]
MIPS: AR7: Add support for Titan (TNETV10xx) SoC variant
Add support for Titan TNETV1050,1055,1056,1060 variants. This SoC is almost
completely identical to AR7 except on a few points:
- a second bank of gpios is available
- vlynq0 on titan is vlynq1 on ar7
- different PHY addresses for cpmac0
This SoC can be found on commercial products like the Linksys WRTP54G
Original patch by Xin with improvments by Florian.
David Daney [Fri, 8 Oct 2010 21:47:52 +0000 (14:47 -0700)]
USB: Add EHCI and OHCH glue for OCTEON II SOCs.
The OCTEON II SOC has USB EHCI and OHCI controllers connected directly
to the internal I/O bus. This patch adds the necessary 'glue' logic
to allow ehci-hcd and ohci-hcd drivers to work on OCTEON II.
The OCTEON normally runs big-endian, and the ehci/ohci internal
registers have host endianness, so we need to select
USB_EHCI_BIG_ENDIAN_MMIO.
The ehci and ohci blocks share a common clocking and PHY
infrastructure. Initialization of the host controller and PHY clocks
is common between the two and is factored out into the
octeon2-common.c file.
Setting of USB_ARCH_HAS_OHCI and USB_ARCH_HAS_EHCI is done in
arch/mips/Kconfig in a following patch.
David Daney [Thu, 7 Oct 2010 23:03:53 +0000 (16:03 -0700)]
MIPS: Octeon: Apply CN63XXP1 errata workarounds.
The CN63XXP1 needs a couple of workarounds to ensure memory is not written
in unexpected ways.
All PREF with hints in the range 0-4,6-24 are replaced with PREF 28. We
pass a flag to the assembler to cover compiler generated code, and patch
uasm for the dynamically generated code.
David Daney [Thu, 7 Oct 2010 23:03:52 +0000 (16:03 -0700)]
WATCHDOG: octeon-wdt: Use I/O clock rate for timing calculations.
The creation of the I/O clock domain requires some adjustments. Since
the watchdog counters are clocked by the I/O clock, use its rate for
timing calculations.
David Daney [Thu, 7 Oct 2010 23:03:51 +0000 (16:03 -0700)]
ATA: pata_octeon_cf: Use I/O clock rate for timing calculations.
The creation of the I/O clock domain requires some adjustments. Since the
CF bus timing logic is clocked by the I/O clock, use its rate for delay
calculations.
David Daney [Thu, 7 Oct 2010 23:03:49 +0000 (16:03 -0700)]
MIPS: Octeon: Add octeon_get_io_clock_rate() for cn63xx
Starting with cn63xx Octeon I/O blocks are clocked at a different rate
than the CPU. Add a new function octeon_get_io_clock_rate() that
yields the I/O clock rate.
Also rearrange octeon_get_clock_rate() to get the value from the saved
sysinfo structure.
David Daney [Fri, 1 Oct 2010 20:27:34 +0000 (13:27 -0700)]
MIPS: Octeon: Rewrite DMA mapping functions.
All Octeon chips can support more than 4GB of RAM. Also due to how Octeon
PCI is setup, even some configurations with less than 4GB of RAM will have
portions that are not accessible from 32-bit devices.
Enable the swiotlb code to handle the cases where a device cannot directly
do DMA. This is a complete rewrite of the Octeon DMA mapping code.
David Daney [Fri, 1 Oct 2010 20:27:32 +0000 (13:27 -0700)]
MIPS: Convert DMA to use dma-mapping-common.h
Use asm-generic/dma-mapping-common.h to handle all DMA mapping operations
and establish a default get_dma_ops() that forwards all operations to the
existing code.
Augment dev_archdata to carry a pointer to the struct dma_map_ops, allowing
DMA operations to be overridden on a per device basis. Currently this is
never filled in, so the default dma_map_ops are used. A follow-on patch
sets this for Octeon PCI devices.
Also initialize the dma_debug system as it is now used if it is configured.
David Daney [Fri, 1 Oct 2010 20:27:31 +0000 (13:27 -0700)]
MIPS: ip32, ip27, jazz: Make static functions in dma-coherence.h inline.
Any function defined in a header file should be inline. This helps us
avoid 'unused' compiler warnings when we include the files in more
places in subsequent patches.
David Daney [Fri, 1 Oct 2010 20:27:29 +0000 (13:27 -0700)]
MIPS: Octeon: Adjust top of DMA32 zone.
On OCTEON, we reserve the last 256MB of 32-bit PCI address space, mapping
the RAM in this region at a high DMA address. This makes memory in this
region unavailable for 32-bit DMA.
Ralf Baechle [Fri, 29 Oct 2010 18:08:24 +0000 (19:08 +0100)]
MIPS: Get rid of branches to .subsections.
It was a nice optimization - on paper at least. In practice it results in
branches that may exceed the maximum legal range for a branch. We can
fight that problem with -ffunction-sections but -ffunction-sections again
is incompatible with -pg used by the function tracer.
By rewriting the loop around all simple LL/SC blocks to C we reduce the
amount of inline assembler and at the same time allow GCC to often fill
the branch delay slots with something sensible or whatever else clever
optimization it may have up in its sleeve.
With this optimization gone we also no longer need -ffunction-sections,
so drop it.
Eric Dumazet [Fri, 29 Oct 2010 17:59:40 +0000 (19:59 +0200)]
netfilter: fix nf_conntrack_l4proto_register()
While doing __rcu annotations work on net/netfilter I found following
bug. On some arches, it is possible we publish a table while its content
is not yet committed to memory, and lockless reader can dereference wild
pointer.
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
cifs: Cleanup and thus reduce smb session structure and fields used during authentication
NTLM auth and sign - Use appropriate server challenge
cifs: add kfree() on error path
NTLM auth and sign - minor error corrections and cleanup
NTLM auth and sign - Use kernel crypto apis to calculate hashes and smb signatures
NTLM auth and sign - Define crypto hash functions and create and send keys needed for key exchange
cifs: cifs_convert_address() returns zero on error
NTLM auth and sign - Allocate session key/client response dynamically
cifs: update comments - [s/GlobalSMBSesLock/cifs_file_list_lock/g]
cifs: eliminate cifsInodeInfo->write_behind_rc (try #6)
[CIFS] Fix checkpatch warnings and bump cifs version number
cifs: wait for writeback to complete in cifs_flush
cifs: convert cifsFileInfo->count to non-atomic counter
Linus Torvalds [Fri, 29 Oct 2010 17:36:49 +0000 (10:36 -0700)]
readv/writev: do the same MAX_RW_COUNT truncation that read/write does
We used to protect against overflow, but rather than return an error, do
what read/write does, namely to limit the total size to MAX_RW_COUNT.
This is not only more consistent, but it also means that any broken
low-level read/write routine that still keeps counts in 'int' can't
break.
H. Peter Anvin [Thu, 28 Oct 2010 04:09:15 +0000 (21:09 -0700)]
x86, ftrace: Use safe noops, drop trap test
Always use a safe 5-byte noop sequence. Drop the trap test, since it
is known to return false negatives on some virtualization platforms on
32 bits. The resulting code is both simpler and safer.
[SCSI] pmcraid: add support for set timestamp command and other fixes
The following are the fixes in this patch:
1. Added support of set timestamp command in the driver
2. Pass all status code to mgmt application. Earlier we were passing
only failed ones.
3. Call class_destroy after unregister_chrdev and pci_unregister_driver
David Miller [Sat, 23 Oct 2010 18:06:24 +0000 (11:06 -0700)]
jump_label: Fix unaligned traps on sparc.
The vmlinux.lds.h knobs to emit the __jump_table section in the main
kernel image takes care to align the section, but this doesn't help
for the __jump_table section that gets emitted into modules.
Fix the resulting lack of section alignment by explicitly specifying
it in the assembler.
Steven Rostedt [Fri, 29 Oct 2010 15:02:43 +0000 (11:02 -0400)]
jump label: Make arch_jump_label_text_poke_early() optional
Some archs do not need to do anything special for jump labels on
startup (like MIPS). This patch adds a weak function stub for
arch_jump_label_text_poke_early();
Steven Rostedt [Mon, 18 Oct 2010 14:38:58 +0000 (10:38 -0400)]
jump label: Fix error with preempt disable holding mutex
Kprobes and jump label were having a race between mutexes that
was fixed by reordering the jump label. But this reordering
moved the jump label mutex into a preempt disable location.
This patch does a little fiddling to move the grabbing of
the jump label mutex from inside the preempt disable section
and still keep the order correct between the mutex and the
kprobes lock.
Linus Torvalds [Fri, 29 Oct 2010 15:49:18 +0000 (08:49 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] fix kprobes single stepping
[S390] tape: fix dbf usage
[S390] dasd: provide a Sense Path Group ID ioctl
[S390] ftrace: select HAVE_C_RECORDMCOUNT
[S390] vdso: get rid of redefinition warnings
[S390] facility detection: remove unused variable
[S390] hypfs: Fix error handling in hypfs_diag initialization
[S390] topology: fix cpu masks for topology=off case
[S390] topology: add SCHED_MC config option
[S390] Kconfig: add machine type number to code generation options
[S390] Add z196 machine type to setup_hwcaps
Linus Torvalds [Fri, 29 Oct 2010 15:47:36 +0000 (08:47 -0700)]
Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
md: tidy up device searches in read_balance.
md/raid1: fix some typos in comments.
md/raid1: discard unused variable.
md: unplug writes to external bitmaps.
md: use separate bio pool for each md device.
md: change type of first arg to sync_page_io.
md/raid1: perform mem allocation before disabling writes during resync.
md: use bio_kmalloc rather than bio_alloc when failure is acceptable.
md: Fix possible deadlock with multiple mempool allocations.
md: fix and update workqueue usage
md: use sector_t in bitmap_get_counter
md: remove md_mutex locking.
md: Fix regression with raid1 arrays without persistent metadata.
Roberto Sassu [Wed, 6 Oct 2010 16:31:32 +0000 (18:31 +0200)]
ecryptfs: added ecryptfs_mount_auth_tok_only mount parameter
This patch adds a new mount parameter 'ecryptfs_mount_auth_tok_only' to
force ecryptfs to use only authentication tokens which signature has
been specified at mount time with parameters 'ecryptfs_sig' and
'ecryptfs_fnek_sig'. In this way, after disabling the passthrough and
the encrypted view modes, it's possible to make available to users only
files encrypted with the specified authentication token.
Roberto Sassu [Wed, 6 Oct 2010 16:31:15 +0000 (18:31 +0200)]
ecryptfs: checking return code of ecryptfs_find_auth_tok_for_sig()
This patch replaces the check of the 'matching_auth_tok' pointer with
the exit status of ecryptfs_find_auth_tok_for_sig().
This avoids to use authentication tokens obtained through the function
ecryptfs_keyring_auth_tok_for_sig which are not valid.
Roberto Sassu [Wed, 6 Oct 2010 16:31:06 +0000 (18:31 +0200)]
ecryptfs: release keys loaded in ecryptfs_keyring_auth_tok_for_sig()
This patch allows keys requested in the function
ecryptfs_keyring_auth_tok_for_sig()to be released when they are no
longer required. In particular keys are directly released in the same
function if the obtained authentication token is not valid.
Further, a new function parameter 'auth_tok_key' has been added to
ecryptfs_find_auth_tok_for_sig() in order to provide callers the key
pointer to be passed to key_put().
eCryptfs: Clear LOOKUP_OPEN flag when creating lower file
eCryptfs was passing the LOOKUP_OPEN flag through to the lower file
system, even though ecryptfs_create() doesn't support the flag. A valid
filp for the lower filesystem could be returned in the nameidata if the
lower file system's create() function supported LOOKUP_OPEN, possibly
resulting in unencrypted writes to the lower file.
However, this is only a potential problem in filesystems (FUSE, NFS,
CIFS, CEPH, 9p) that eCryptfs isn't known to support today.
Roberto Sassu [Tue, 5 Oct 2010 16:53:45 +0000 (18:53 +0200)]
ecryptfs: call vfs_setxattr() in ecryptfs_setxattr()
Ecryptfs is a stackable filesystem which relies on lower filesystems the
ability of setting/getting extended attributes.
If there is a security module enabled on the system it updates the
'security' field of inodes according to the owned extended attribute set
with the function vfs_setxattr(). When this function is performed on a
ecryptfs filesystem the 'security' field is not updated for the lower
filesystem since the call security_inode_post_setxattr() is missing for
the lower inode.
Further, the call security_inode_setxattr() is missing for the lower inode,
leading to policy violations in the security module because specific
checks for this hook are not performed (i. e. filesystem
'associate' permission on SELinux is not checked for the lower filesystem).
This patch replaces the call of the setxattr() method of the lower inode
in the function ecryptfs_setxattr() with vfs_setxattr().
Chris Mason [Thu, 28 Oct 2010 19:30:42 +0000 (15:30 -0400)]
Btrfs: fix raid code for removing missing drives
When btrfs is mounted in degraded mode, it has some internal structures
to track the missing devices. This missing device is setup as readonly,
but the mapping code can get upset when we try to write to it.
This changes the mapping code to return -EIO instead of oops when we try
to write to the readonly device.
Miao Xie [Wed, 27 Oct 2010 00:57:29 +0000 (20:57 -0400)]
Btrfs: Switch the extent buffer rbtree into a radix tree
This patch reduces the CPU time spent in the extent buffer search by using the
radix tree instead of the rbtree and using the rcu lock instead of the spin
lock.
I did a quick test by the benchmark tool[1] and found the patch improve the
file creation/deletion performance problem that I have reported[2].
Before applying this patch:
Create files:
Total files: 50000
Total time: 0.971531
Average time: 0.000019
Delete files:
Total files: 50000
Total time: 1.366761
Average time: 0.000027
After applying this patch:
Create files:
Total files: 50000
Total time: 0.927455
Average time: 0.000019
Delete files:
Total files: 50000
Total time: 1.292280
Average time: 0.000026
Chris Mason [Tue, 26 Oct 2010 17:40:45 +0000 (13:40 -0400)]
Btrfs: use the flusher threads for delalloc throttling
We have a fairly complex set of loops around walking our list of
delalloc inodes when we find metadata delalloc space running low.
It doesn't work very well, can use large amounts of CPU and doesn't
do very efficient writeback.
This switches us to kick the bdi flusher threads instead. All dirty
data in btrfs is accounted as delalloc data, so this is very similar
in terms of what it writes, but we're able to just kick off the IO
and wait for progress.
Chris Mason [Tue, 26 Oct 2010 17:37:56 +0000 (13:37 -0400)]
Btrfs: tune the chunk allocation to 5% of the FS as metadata
An earlier commit tried to keep us from allocating too many
empty metadata chunks. It was somewhat too restrictive and could
lead to ENOSPC errors on empty filesystems.
This increases the limits to about 5% of the FS size, allowing more
metadata chunks to be preallocated.
Chris Mason [Fri, 29 Oct 2010 15:16:17 +0000 (11:16 -0400)]
Add new functions for triggering inode writeback
When btrfs is running low on metadata space, it needs to force delayed
allocation pages to disk. It currently does this with a suboptimal walk
of a private list of inodes with delayed allocation, and it would be
much better if we used the generic flusher threads.
writeback_inodes_sb_if_idle would be ideal, but it waits for the flusher
thread to start IO on all the dirty pages in the FS before it returns.
This adds variants of writeback_inodes_sb* that allow the caller to
control how many pages get sent down.
Fix kprobes after git commit 1e54622e0403891b10f2105663e0f9dd595a1f17
broke it. The kprobe_handler is now called with interrupts in the state
at the time of the breakpoint. The single step of the replaced instruction
is done with interrupts off which makes it necessary to enable and disable
the interupts in the kprobes code.
Stefan Weinhuber [Fri, 29 Oct 2010 14:50:43 +0000 (16:50 +0200)]
[S390] dasd: provide a Sense Path Group ID ioctl
The BIODASDSNID ioctl executes a 'Sense Path Group ID'
command on a DASD ECKD device. The returned path group data
allows user space programs to determine path state and
path group ID of the channel paths to the device.
Heiko Carstens [Fri, 29 Oct 2010 14:50:41 +0000 (16:50 +0200)]
[S390] vdso: get rid of redefinition warnings
The CLOCK_* defines in asm-offsets.c are only used for the vdso code
however in the meantime they cause other trouble.
Just rename them to get permanently rid of this:
In file included from /home2/heicarst/linux-2.6/arch/s390/include/asm/asm-offsets.h:1:0,
from arch/s390/mm/fault.c:33:
include/generated/asm-offsets.h:53:0: warning: "CLOCK_REALTIME" redefined
include/linux/time.h:286:0: note: this is the location of the previous definition
include/generated/asm-offsets.h:54:0: warning: "CLOCK_MONOTONIC" redefined
include/linux/time.h:287:0: note: this is the location of the previous definition
Michael Holzheu [Fri, 29 Oct 2010 14:50:39 +0000 (16:50 +0200)]
[S390] hypfs: Fix error handling in hypfs_diag initialization
Fix the following two error handling bugs in hypfs_diag_init():
* No need for calling diag204_free_buffer()
* Initialize name table only in case of LPAR and prevent error message
on non-LPAR systems.
Heiko Carstens [Fri, 29 Oct 2010 14:50:38 +0000 (16:50 +0200)]
[S390] topology: fix cpu masks for topology=off case
Fix cpu masks for 'topology=off' case. Folding of the scheduling domains
happen in such a way that everything belongs to the MC domain instead
of the CPU doimain.
This should fix a performance regression introduced with eafd2b6d "[S390] topology: use default MC domain initializer" and also
makes sure we have the same behavious as if CONFIG_SCHED_MC was not
selected at all.
Heiko Carstens [Fri, 29 Oct 2010 14:50:37 +0000 (16:50 +0200)]
[S390] topology: add SCHED_MC config option
This allows us to easily check for performance differences seen with
!CONFIG_SCHED_MC and topology=off.
Actually there shouldn't be any (besides a small overhead because of
additional code).
Heiko Carstens [Fri, 29 Oct 2010 14:50:36 +0000 (16:50 +0200)]
[S390] Kconfig: add machine type number to code generation options
Add machine type number to code generation options. Also clean up and
shorten quite a lot of help texts with respect to machine type and
architecture terminology.
Patrick McHardy [Fri, 29 Oct 2010 14:28:07 +0000 (16:28 +0200)]
netfilter: nf_nat: fix compiler warning with CONFIG_NF_CT_NETLINK=n
net/ipv4/netfilter/nf_nat_core.c:52: warning: 'nf_nat_proto_find_get' defined but not used
net/ipv4/netfilter/nf_nat_core.c:66: warning: 'nf_nat_proto_put' defined but not used
Chris Mason [Sun, 24 Oct 2010 15:01:27 +0000 (11:01 -0400)]
Btrfs: don't loop forever on bad btree blocks
When btrfs discovers the generation number in a btree block is
incorrect, it can loop forever without forcing the RAID
code to try a valid mirror, and without returning EIO.
Kevin Winchester [Fri, 29 Oct 2010 13:29:55 +0000 (15:29 +0200)]
PM / Runtime: Fix typo in status comparison causing warning
GCC version 4.5.1 gives the following warning:
drivers/base/power/runtime.c: In function ‘rpm_check_suspend_allowed’:
drivers/base/power/runtime.c:146:25: warning: comparison between ‘enum dpm_state’ and ‘enum rpm_status’
which seems to be a typo in that dev->power.runtime_status
should be compared instead of dev->power.status.
Josef Bacik [Thu, 28 Oct 2010 20:55:47 +0000 (16:55 -0400)]
Btrfs: let the user know space caching is enabled
If you mount -o space_cache, the option will be persistent across mounts, but to
make sure the user knows that they did this, emit a message telling them if they
didn't mount with -o space_cache but the feature is still used.
Josef Bacik [Tue, 21 Sep 2010 18:21:34 +0000 (14:21 -0400)]
Btrfs: Add a clear_cache mount option
If something goes wrong with the free space cache we need a way to make sure
it's not loaded on mount and that it's cleared for everybody. When you pass the
clear_cache option it will make it so all block groups are setup to be cleared,
which keeps them from being loaded and then they will be truncated when the
transaction is committed. Thanks,
Josef Bacik [Thu, 16 Sep 2010 20:19:09 +0000 (16:19 -0400)]
Btrfs: add support for mixed data+metadata block groups
There are just a few things that need to be fixed in the kernel to support mixed
data+metadata block groups. Mostly we just need to make sure that if we are
using mixed block groups that we continue to allocate mixed block groups as we
need them. Also we need to make sure __find_space_info will find our space info
if we search for DATA or METADATA only. Tested this with xfstests and it works
nicely. Thanks,
Josef Bacik [Thu, 16 Sep 2010 20:17:03 +0000 (16:17 -0400)]
Btrfs: check cache->caching_ctl before returning if caching has started
With the free space disk caching we can mark the block group as started with the
caching, but we don't have a caching ctl. This can race with anybody else who
tries to get the caching ctl before we cache (this is very hard to do btw). So
instead check to see if cache->caching_ctl is set, and if not return NULL.
Thanks,
Josef Bacik [Wed, 25 Aug 2010 20:54:15 +0000 (16:54 -0400)]
Btrfs: load free space cache if it exists
This patch actually loads the free space cache if it exists. The only thing
that really changes here is that we need to cache the block group if we're going
to remove an extent from it. Previously we did not do this since the caching
kthread would pick it up. With the on disk cache we don't have this luxury so
we need to make sure we read the on disk cache in first, and then remove the
extent, that way when the extent is unpinned the free space is added to the
block group. This has been tested with all sorts of things.
Josef Bacik [Fri, 2 Jul 2010 16:14:14 +0000 (12:14 -0400)]
Btrfs: write out free space cache
This is a simple bit, just dump the free space cache out to our preallocated
inode when we're writing out dirty block groups. There are a bunch of changes
in inode.c in order to account for special cases. Mostly when we're doing the
writeout we're holding trans_mutex, so we need to use the nolock transacation
functions. Also we can't do asynchronous completions since the async thread
could be blocked on already completed IO waiting for the transaction lock. This
has been tested with xfstests and btrfs filesystem balance, as well as my ENOSPC
tests. Thanks,
Paul Mundt [Fri, 29 Oct 2010 10:52:07 +0000 (19:52 +0900)]
sh: mach-se: Rip out superfluous 7751 PIO routines.
MRSHPC is wholly unused here, no need to trap it specially. If support is
added in the future it can be taken care of via platform data like on the
others.
Paul Mundt [Fri, 29 Oct 2010 10:34:13 +0000 (19:34 +0900)]
sh: mach-edosk7705: update for this century, kill off PIO trapping.
The only reason this board needs to do PIO trapping is for ethernet,
which happens to follow the same scheme as its bigger brother the
edosk7760. With ethernet properly supported through the platform device,
we can kill off the left over PIO abortion.
Paul Mundt [Fri, 29 Oct 2010 10:24:59 +0000 (19:24 +0900)]
sh: mach-se: Rip out superfluous 7206 PIO routines.
The PIO trapping was only for MRSHPC and the SMC ethernet. Given that the
SMC ethernet is already properly handled and that nothing is using the
MRSHPC, none of this is needed.