Matthew Wilcox [Tue, 8 Sep 2015 21:59:37 +0000 (14:59 -0700)]
dax: ensure that zero pages are removed from other processes
If the first access to a huge page was a store, there would be no existing
zero pmd in this process's page tables. There could be a zero pmd in
another process's page tables, if it had done a load. We can detect this
case by noticing that the buffer_head returned from the filesystem is New,
and ensure that other processes mapping this huge page have their page
tables flushed.
This is another place where DAX assumed that pgtable_t was a pointer.
Open code the important parts of set_huge_zero_page() in DAX and make
set_huge_zero_page() static again.
The original DAX code assumed that pgtable_t was a pointer, which isn't
true on all architectures. Restructure the code to not rely on that
assumption.
Matthew Wilcox [Tue, 8 Sep 2015 21:59:25 +0000 (14:59 -0700)]
dax: fix race between simultaneous faults
If two threads write-fault on the same hole at the same time, the winner
of the race will return to userspace and complete their store, only to
have the loser overwrite their store with zeroes. Fix this for now by
taking the i_mmap_sem for write instead of read, and do so outside the
call to get_block(). Now the loser of the race will see the block has
already been zeroed, and will not zero it again.
This severely limits our scalability. I have ideas for improving it, but
those can wait for a later patch.
Matthew Wilcox [Tue, 8 Sep 2015 21:59:22 +0000 (14:59 -0700)]
ext4: start transaction before calling into DAX
Jan Kara pointed out that in the case where we are writing to a hole, we
can end up with a lock inversion between the page lock and the journal
lock. We can avoid this by starting the transaction in ext4 before
calling into DAX. The journal lock nests inside the superblock
pagefault lock, so we have to duplicate that code from dax_fault, like
XFS does.
Matthew Wilcox [Tue, 8 Sep 2015 21:59:20 +0000 (14:59 -0700)]
ext4: add ext4_get_block_dax()
DAX wants different semantics from any currently-existing ext4 get_block
callback. Unlike ext4_get_block_write(), it needs to honour the
'create' flag, and unlike ext4_get_block(), it needs to be able to
return unwritten extents. So introduce a new ext4_get_block_dax() which
has those semantics.
We could also change ext4_get_block_write() to honour the 'create' flag,
but that might have consequences on other users that I do not currently
understand.
Matthew Wilcox [Tue, 8 Sep 2015 21:59:14 +0000 (14:59 -0700)]
thp: change insert_pfn's return type to void
It would make more sense to have all the return values from
vmf_insert_pfn_pmd() encoded in one place instead of having to follow
the convention into insert_pfn(). Suggested by Jeff Moyer.
Matthew Wilcox [Tue, 8 Sep 2015 21:59:11 +0000 (14:59 -0700)]
ext4: use ext4_get_block_write() for DAX
DAX relies on the get_block function either zeroing newly allocated
blocks before they're findable by subsequent calls to get_block, or
marking newly allocated blocks as unwritten. ext4_get_block() cannot
create unwritten extents, but ext4_get_block_write() can.
Matthew Wilcox [Tue, 8 Sep 2015 21:58:54 +0000 (14:58 -0700)]
mm: add vmf_insert_pfn_pmd()
Similar to vm_insert_pfn(), but for PMDs rather than PTEs. The 'vmf_'
prefix instead of 'vm_' prefix is intended to indicate that it returns a
VMF_ value rather than an errno (which would only have to be converted
into a VMF_ value anyway).
Andrew Morton [Tue, 8 Sep 2015 21:58:43 +0000 (14:58 -0700)]
dax: revert userfaultfd change
Undo the change which "userfaultfd: call handle_userfault() for
userfaultfd_missing() faults" made to set_huge_zero_page(). DAX will
need that return value.
Matthew Wilcox [Tue, 8 Sep 2015 21:58:40 +0000 (14:58 -0700)]
dax: move DAX-related functions to a new header
In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to
return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is
defined in linux/mm.h. Given that we don't want to include <linux/mm.h>
in <linux/fs.h>, the easiest solution is to move the DAX-related
functions to a new header, <linux/dax.h>. We could also have moved
VM_FAULT_* definitions to a new header, or a different header that isn't
quite such a boil-the-ocean header as <linux/mm.h>, but this felt like
the best option.
thp: vma_adjust_trans_huge(): adjust file-backed VMA too
This series of patches adds support for using PMD page table entries to
map DAX files. We expect NV-DIMMs to start showing up that are many
gigabytes in size and the memory consumption of 4kB PTEs will be
astronomical.
The patch series leverages much of the Transparant Huge Pages
infrastructure, going so far as to borrow one of Kirill's patches from
his THP page cache series.
This patch (of 10):
Since we're going to have huge pages in page cache, we need to call adjust
file-backed VMA, which potentially can contain huge pages.
For now we call it for all VMAs.
Probably later we will need to introduce a flag to indicate that the VMA
has huge pages.
fails without this patch. Before the previous commit it gets the wrong
page, now it segfaults (which is imho better).
This is because copy_vma() wrongly assumes that if vma->vm_file == NULL
is irrelevant until the first fault which will use do_anonymous_page().
This is obviously wrong for the special mapping.
int main(void)
{
void *vdso = find_vdso_vaddr();
assert(vdso);
// of course they should differ, and they do so far
printf("vdso pages differ: %d\n",
!!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));
// split into 2 vma's
assert(mprotect(vdso, PAGE_SIZE, PROT_READ) == 0);
// force another fault on the next check
assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0);
// now they no longer differ, the 2nd vm_pgoff is wrong
printf("vdso pages differ: %d\n",
!!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE));
return 0;
}
Output:
vdso pages differ: 1
vdso pages differ: 0
This is because split_vma() correctly updates ->vm_pgoff, but the logic
in insert_vm_struct() and special_mapping_fault() is absolutely broken,
so the fault at vdso + PAGE_SIZE return the 1st page. The same happens
if you simply unmap the 1st page.
special_mapping_fault() does:
pgoff = vmf->pgoff - vma->vm_pgoff;
and this is _only_ correct if vma->vm_start mmaps the first page from
->vm_private_data array.
vdso or any other user of install_special_mapping() is not anonymous,
it has the "backing storage" even if it is just the array of pages.
So we actually need to make vm_pgoff work as an offset in this array.
Note: this also allows to fix another problem: currently gdb can't access
"[vvar]" memory because in this case special_mapping_fault() doesn't work.
Now that we can use ->vm_pgoff we can implement ->access() and fix this.
special_mapping_fault() is absolutely broken. It seems it was always
wrong, but this didn't matter until vdso/vvar started to use more than
one page.
And after this change vma_is_anonymous() becomes really trivial, it
simply checks vm_ops == NULL. However, I do think the helper makes
sense. There are a lot of ->vm_ops != NULL checks, the helper makes the
caller's code more understandable (self-documented) and this is more
grep-friendly.
This patch (of 3):
Preparation. Add the new simple helper, vma_is_anonymous(vma), and change
handle_pte_fault() to use it. It will have more users.
The name is not accurate, say a hpet_mmap()'ed vma is not anonymous.
Perhaps it should be named vma_has_fault() instead. But it matches the
logic in mmap.c/memory.c (see next changes). "True" just means that a
page fault will use do_anonymous_page().
selftests/userfaultfd: fix compiler warnings on 32-bit
On 32-bit:
userfaultfd.c: In function 'locking_thread':
userfaultfd.c:152: warning: left shift count >= width of type
userfaultfd.c: In function 'uffd_poll_thread':
userfaultfd.c:295: warning: cast to pointer from integer of different size
userfaultfd.c: In function 'uffd_read_thread':
userfaultfd.c:332: warning: cast to pointer from integer of different size
Fix the shift warning by splitting the shift in two parts, and the
integer/pointer warnigns by adding intermediate casts to "unsigned long".
cgroup: fix seq_show_option merge with legacy_name
When seq_show_option (commit a068acf2ee77: "fs: create and use
seq_show_option for escaping") was merged, it did not correctly collide
with cgroup's addition of legacy_name (commit 3e1d2eed39d8: "cgroup:
introduce cgroup_subsys->legacy_name") changes.
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull misc kbuild updates from Michal Marek:
- deb-pkg:
+ module signing fix
+ dtb files are added to the package
+ do not require `hostname -f` to work during build
+ make deb-pkg generates a source package, bindeb-pkg has been
added to only generate the binary package
- rpm-pkg packages /lib/modules as well
- new coccinelle patch and updates to existing ones
- new stackusage & stackdelta script to collect and compare stack usage
info (using gcc's -fstack-usage)
- make tags understands trace_*_rcuidle() macros
- .gitignore updates, misc cleanups
* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (27 commits)
deb-pkg: add source package
package/Makefile: move source tar creation to a function
scripts: add stackdelta script
kbuild: remove *.su files generated by -fstack-usage
.gitignore: add *.su pattern
scripts: add stackusage script
kbuild: avoid listing /lib/modules in kernel spec file
fallback to hostname in scripts/package/builddeb
coccinelle: api: extend spatch for dropping unnecessary owner
deb-pkg: simplify directory creation
scripts/tags.sh: Include trace_*_rcuidle() in tags
scripts/package/Makefile: rpmbuild is needed for rpm targets
Kbuild: Add ID files to .gitignore
gitignore: Add MIPS vmlinux.32 to the list
coccinelle: simple_return: Add a blank line
coccinelle: irqf_oneshot.cocci: Improve the generated commit log
coccinelle: api: add vma_pages.cocci
scripts/coccinelle/misc/irqf_oneshot.cocci: Fix grammar
scripts/coccinelle/misc/semicolon.cocci: Use imperative mood
coccinelle: simple_open: Use imperative mood
...
Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kconfig updates from Michal Marek:
- kconfig warns about junk characters in Kconfig files
- merge_config.sh error handling
- small cleanup
* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
merge_config.sh: exit on missing input files
kconfig: Regenerate shipped zconf.{hash,lex}.c files
kconfig: warn of unhandled characters in Kconfig commands
kconfig: Delete unnecessary checks before the function call "sym_calc_value"
Merge tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing update from Steven Rostedt:
"Mostly this is just clean ups and micro optimizations.
The changes with more meat are:
- Allowing the trace event filters to filter on CPU number and
process ids
- Two new markers for trace output latency were added (10 and 100
msec latencies)
- Have tracing_thresh filter function profiling time
I also worked on modifying the ring buffer code for some future work,
and moved the adding of the timestamp around. One of my changes
caused a regression, and since other changes were built on top of it
and already tested, I had to operate a revert of that change. Instead
of rebasing, this change set has the code that caused a regression as
well as the code to revert that change without touching the other
changes that were made on top of it"
* tag 'trace-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Revert "ring-buffer: Get timestamp after event is allocated"
tracing: Don't make assumptions about length of string on task rename
tracing: Allow triggers to filter for CPU ids and process names
ftrace: Format MCOUNT_ADDR address as type unsigned long
tracing: Introduce two additional marks for delay
ftrace: Fix function_graph duration spacing with 7-digits
ftrace: add tracing_thresh to function profile
tracing: Clean up stack tracing and fix fentry updates
ring-buffer: Reorganize function locations
ring-buffer: Make sure event has enough room for extend and padding
ring-buffer: Get timestamp after event is allocated
ring-buffer: Move the adding of the extended timestamp out of line
ring-buffer: Add event descriptor to simplify passing data
ftrace: correct the counter increment for trace_buffer data
tracing: Fix for non-continuous cpu ids
tracing: Prefer kcalloc over kzalloc with multiply
Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit
Pull audit update from Paul Moore:
"This is one of the larger audit patchsets in recent history,
consisting of eight patches and almost 400 lines of changes.
The bulk of the patchset is the new "audit by executable"
functionality which allows admins to set an audit watch based on the
executable on disk. Prior to this, admins could only track an
application by PID, which has some obvious limitations.
Beyond the new functionality we also have some refcnt fixes and a few
minor cleanups"
* 'upstream' of git://git.infradead.org/users/pcmoore/audit:
fixup: audit: implement audit by executable
audit: implement audit by executable
audit: clean simple fsnotify implementation
audit: use macros for unset inode and device values
audit: make audit_del_rule() more robust
audit: fix uninitialized variable in audit_add_rule()
audit: eliminate unnecessary extra layer of watch parent references
audit: eliminate unnecessary extra layer of watch references
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
"Highlights:
- PKCS#7 support added to support signed kexec, also utilized for
module signing. See comments in 3f1e1bea.
** NOTE: this requires linking against the OpenSSL library, which
must be installed, e.g. the openssl-devel on Fedora **
- Smack
- add IPv6 host labeling; ignore labels on kernel threads
- support smack labeling mounts which use binary mount data
- SELinux:
- add ioctl whitelisting (see
http://kernsec.org/files/lss2015/vanderstoep.pdf)
- fix mprotect PROT_EXEC regression caused by mm change
- Seccomp:
- add ptrace options for suspend/resume"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (57 commits)
PKCS#7: Add OIDs for sha224, sha284 and sha512 hash algos and use them
Documentation/Changes: Now need OpenSSL devel packages for module signing
scripts: add extract-cert and sign-file to .gitignore
modsign: Handle signing key in source tree
modsign: Use if_changed rule for extracting cert from module signing key
Move certificate handling to its own directory
sign-file: Fix warning about BIO_reset() return value
PKCS#7: Add MODULE_LICENSE() to test module
Smack - Fix build error with bringup unconfigured
sign-file: Document dependency on OpenSSL devel libraries
PKCS#7: Appropriately restrict authenticated attributes and content type
KEYS: Add a name for PKEY_ID_PKCS7
PKCS#7: Improve and export the X.509 ASN.1 time object decoder
modsign: Use extract-cert to process CONFIG_SYSTEM_TRUSTED_KEYS
extract-cert: Cope with multiple X.509 certificates in a single file
sign-file: Generate CMS message as signature instead of PKCS#7
PKCS#7: Support CMS messages also [RFC5652]
X.509: Change recorded SKID & AKID to not include Subject or Issuer
PKCS#7: Check content type and versions
MAINTAINERS: The keyrings mailing list has moved
...
Merge branch 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull NMI backtrace update from Russell King:
"These changes convert the x86 NMI handling to be a library
implementation which other architectures can make use of. Thomas
Gleixner has reviewed and tested these changes, and wishes me to send
these rather than taking them through the tip tree.
The final patch in the set adds an initial implementation using this
infrastructure to ARM, even though it doesn't send the IPI at "NMI"
level. Patches are in progress to add the ARM equivalent of NMI, but
we still need the IRQ-level fallback for systems where the "NMI" isn't
available due to secure firmware denying access to it"
* 'nmi' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: add basic support for on-demand backtrace of other CPUs
nmi: x86: convert to generic nmi handler
nmi: create generic NMI backtrace implementation
Merge tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from David Vrabel:
"Xen features and fixes for 4.3:
- Convert xen-blkfront to the multiqueue API
- [arm] Support binding event channels to different VCPUs.
- [x86] Support > 512 GiB in a PV guests (off by default as such a
guest cannot be migrated with the current toolstack).
- [x86] PMU support for PV dom0 (limited support for using perf with
Xen and other guests)"
* tag 'for-linus-4.3-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (33 commits)
xen: switch extra memory accounting to use pfns
xen: limit memory to architectural maximum
xen: avoid another early crash of memory limited dom0
xen: avoid early crash of memory limited dom0
arm/xen: Remove helpers which are PV specific
xen/x86: Don't try to set PCE bit in CR4
xen/PMU: PMU emulation code
xen/PMU: Intercept PMU-related MSR and APIC accesses
xen/PMU: Describe vendor-specific PMU registers
xen/PMU: Initialization code for Xen PMU
xen/PMU: Sysfs interface for setting Xen PMU mode
xen: xensyms support
xen: remove no longer needed p2m.h
xen: allow more than 512 GB of RAM for 64 bit pv-domains
xen: move p2m list if conflicting with e820 map
xen: add explicit memblock_reserve() calls for special pages
mm: provide early_memremap_ro to establish read-only mapping
xen: check for initrd conflicting with e820 map
xen: check pre-allocated page tables for conflict with memory map
xen: check for kernel memory conflicting with memory layout
...
Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more irq updates from Thomas Gleixner:
"The second part of irq related updates:
- Provide EOImode for GIC[V3] irq chips, which is a prerequisite for
direct interrupt handling in [KVM] guests"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/GIC: Fix EOImode setting for non-DT/ACPI systems
irqchip/GIC: Don't deactivate interrupts forwarded to a guest
irqchip/GIC: Convert to EOImode == 1
irqchip/GICv3: Don't deactivate interrupts forwarded to a guest
irqchip/GICv3: Convert to EOImode == 1
Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68k/colfire fixes from Greg Ungerer:
"Only a couple of patches this time. One migrating the clock driver
code to the new set-state interface. The other cleaning up to use the
PFN_DOWN macro"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68k/coldfire: use PFN_DOWN macro
m68k/coldfire/pit: Migrate to new 'set-state' interface
Merge tag 'ecryptfs-4.3-rc1-stale-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull ecryptfs fixes from Tyler Hicks:
"Invalidate stale eCryptfs dcache entries caused by unlinked lower
inodes"
* tag 'ecryptfs-4.3-rc1-stale-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Delete a check before the function call "key_put"
eCryptfs: Invalidate dcache entries when lower i_nlink is zero
Instead of using physical addresses for accounting of extra memory
areas available for ballooning switch to pfns as this is much less
error prone regarding partial pages.
When a pv-domain (including dom0) is started it tries to size it's
p2m list according to the maximum possible memory amount it ever can
achieve. Limit the initial maximum memory size to the architectural
limit of the hardware in order to avoid overflows during remapping
of memory.
This problem will occur when dom0 is started with an initial memory
size being a multiple of 1GB, but without specifying it's maximum
memory size. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen.
Juergen Gross [Wed, 19 Aug 2015 16:53:11 +0000 (18:53 +0200)]
xen: avoid another early crash of memory limited dom0
Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory occurring
only on some hardware.
The problem arises in case dom0 is started with initial memory and
maximum memory being the same. The kernel must be configured without
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG for the problem to happen. If all
of this is true and the E820 map of the machine is sparse (some areas
are not covered) then the machine might crash early in the boot
process.
An example E820 map triggering the problem looks like this:
In this case the area a0000-dffff isn't present in the map. This will
confuse the memory setup of the domain when remapping the memory from
such holes to populated areas.
To avoid the problem the accounting of to be remapped memory has to
count such holes in the E820 map as well.
Juergen Gross [Wed, 19 Aug 2015 16:52:34 +0000 (18:52 +0200)]
xen: avoid early crash of memory limited dom0
Commit b1c9f169047b ("xen: split counting of extra memory pages...")
introduced an error when dom0 was started with limited memory.
The problem arises in case dom0 is started with initial memory and
maximum memory being the same and exactly a multiple of 1 GB. The
kernel must be configured without CONFIG_XEN_BALLOON_MEMORY_HOTPLUG
for the problem to happen. In this case it will crash very early
during boot due to the virtual mapped p2m list not being large
enough to be able to remap any memory:
This can be avoided by allocating aneough space for the p2m to cover
the maximum memory of dom0 plus the identity mapped holes required
for PCI space, BIOS etc.
parisc: Use double word condition in 64bit CAS operation
The attached change fixes the condition used in the "sub" instruction.
A double word comparison is needed. This fixes the 64-bit LWS CAS
operation on 64-bit kernels.
parisc: Filter out spurious interrupts in PA-RISC irq handler
When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
long way to go to find the right IRQ line, registering it, then registering the
serial port and the irq handler for the serial port. During this phase spurious
interrupts for the serial port may happen which then crashes the kernel because
the action handler might not have been set up yet.
So, basically it's a race condition between the serial port hardware and the
CPU which sets up the necessary fields in the irq sructs. The main reason for
this race is, that we unmask the serial port irqs too early without having set
up everything properly before (which isn't easily possible because we need the
IRQ number to register the serial ports).
This patch is a work-around for this problem. It adds checks to the CPU irq
handler to verify if the IRQ action field has been initialized already. If not,
we just skip this interrupt (which isn't critical for a serial port at bootup).
The real fix would probably involve rewriting all PA-RISC specific IRQ code
(for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
the irq chips and proper irq enabling along this line.
This bug has been in the PA-RISC port since the beginning, but the crashes
happened very rarely with currently used hardware. But on the latest machine
which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
crashed at every boot because of this race. So, without this patch the machine
would currently be unuseable.
For the record, here is the flow logic:
1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
5. serial_init_chip() then registers the 8250 port.
Problems:
- In step 4 the CPU irq shouldn't have been registered yet, but after step 5
- If serial irq happens between 4 and 5 have finished, the kernel will crash
parisc: Additionally check for in_atomic() in page fault handler
Craig Estey noticed that we didn't checked for in_atomic() in our page fault
handler like other architectures. This commit adds this check by using
faulthandler_disabled() which includes a check for pagefault_disabled() and
in_atomic().
PCI,parisc: Enable 64-bit bus addresses on PA-RISC
Commit 3a9ad0b ("PCI: Add pci_bus_addr_t") unconditionally introduced usage of
64-bit PCI bus addresses on all 64-bit platforms which broke PA-RISC.
It turned out that due to enabling the 64-bit addresses, the PCI logic decided
to use the GMMIO instead of the LMMIO region. This commit simply disables
registering the GMMIO and thus we fall back to use the LMMIO region as before.
Guenter Roeck [Sat, 1 Aug 2015 02:34:46 +0000 (19:34 -0700)]
parisc: Define ioremap_uc and ioremap_wc
Commit 3cc2dac5be3f ("drivers/video/fbdev/atyfb: Replace MTRR UC hole
with strong UC") introduces calls to ioremap_wc and ioremap_uc. This
causes build failures with parisc:allmodconfig. Map the missing
functions to ioremap_nocache.
Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable patches:
- Fix atomicity of pNFS commit list updates
- Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY)
- nfs_set_pgio_error sometimes misses errors
- Fix a thinko in xs_connect()
- Fix borkage in _same_data_server_addrs_locked()
- Fix a NULL pointer dereference of migration recovery ops for v4.2
client
- Don't let the ctime override attribute barriers.
- Revert "NFSv4: Remove incorrect check in can_open_delegated()"
- Ensure flexfiles pNFS driver updates the inode after write finishes
- flexfiles must not pollute the attribute cache with attrbutes from
the DS
- Fix a protocol error in layoutreturn
- Fix a protocol issue with NFSv4.1 CLOSE stateids
Bugfixes + cleanups
- pNFS blocks bugfixes from Christoph
- Various cleanups from Anna
- More fixes for delegation corner cases
- Don't fsync twice for O_SYNC/IS_SYNC files
- Fix pNFS and flexfiles layoutstats bugs
- pnfs/flexfiles: avoid duplicate tracking of mirror data
- pnfs: Fix layoutget/layoutreturn/return-on-close serialisation
issues
- pnfs/flexfiles: error handling retries a layoutget before fallback
to MDS
Features:
- Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from
Kinglong
- More RDMA client transport improvements from Chuck
- Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr()
verbs from the SUNRPC, Lustre and core infiniband tree.
- Optimise away the close-to-open getattr if there is no cached data"
* tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits)
NFSv4: Respect the server imposed limit on how many changes we may cache
NFSv4: Express delegation limit in units of pages
Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files"
NFS: Optimise away the close-to-open getattr if there is no cached data
NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb
NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error()
nfs: Remove unneeded checking of the return value from scnprintf
nfs: Fix truncated client owner id without proto type
NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid
NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid
NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid()
NFSv4.1/flexfiles: Fix freeing of mirrors
NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file
NFSv4.1/pnfs: Handle LAYOUTGET return values correctly
NFSv4.1/pnfs: Don't ask for a read layout for an empty file.
NFSv4.1: Fix a protocol issue with CLOSE stateids
NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors
SUNRPC: Prevent SYN+SYNACK+RST storms
SUNRPC: xs_reset_transport must mark the connection as disconnected
NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload
...
Merge tag 'xfs-for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs updates from Dave Chinner:
"There isn't a whole lot to this update - it's mostly bug fixes and
they are spread pretty much all over XFS. There are some corruption
fixes, some fixes for log recovery, some fixes that prevent unount
from hanging, a lockdep annotation rework for inode locking to prevent
false positives and the usual random bunch of cleanups and minor
improvements.
Deatils:
- large rework of EFI/EFD lifecycle handling to fix log recovery
corruption issues, crashes and unmount hangs
- separate metadata UUID on disk to enable changing boot label UUID
for v5 filesystems
- fixes for gcc miscompilation on certain platforms and optimisation
levels
- remote attribute allocation and recovery corruption fixes
- inode lockdep annotation rework to fix bugs with too many
subclasses
- directory inode locking changes to prevent lockdep false positives
- a handful of minor corruption fixes
- various other small cleanups and bug fixes"
* tag 'xfs-for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (42 commits)
xfs: fix error gotos in xfs_setattr_nonsize
xfs: add mssing inode cache attempts counter increment
xfs: return errors from partial I/O failures to files
libxfs: bad magic number should set da block buffer error
xfs: fix non-debug build warnings
xfs: collapse allocsize and biosize mount option handling
xfs: Fix file type directory corruption for btree directories
xfs: lockdep annotations throw warnings on non-debug builds
xfs: Fix uninitialized return value in xfs_alloc_fix_freelist()
xfs: inode lockdep annotations broke non-lockdep build
xfs: flush entire file on dio read/write to cached file
xfs: Fix xfs_attr_leafblock definition
libxfs: readahead of dir3 data blocks should use the read verifier
xfs: stop holding ILOCK over filldir callbacks
xfs: clean up inode lockdep annotations
xfs: swap leaf buffer into path struct atomically during path shift
xfs: relocate sparse inode mount warning
xfs: dquots should be stamped with sb_meta_uuid
xfs: log recovery needs to validate against sb_meta_uuid
xfs: growfs not aware of sb_meta_uuid
...
NFSv4: Respect the server imposed limit on how many changes we may cache
The NFSv4 delegation spec allows the server to tell a client to limit how
much data it cache after the file is closed. In return, the server
guarantees enough free space to avoid ENOSPC situations, etc.
Prior to this patch, we assumed we could always cache aggressively after
close. Unfortunately, this causes problems with servers that set the
limit to 0 and therefore do not offer any ENOSPC guarantees.
Since we're tracking modifications to the page cache on a per-page
basis, it makes sense to express the limit to how much we may cache
in units of pages.
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:
"In this one:
- d_move fixes (Eric Biederman)
- UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
handling ought to be fixed)
- switch of sb_writers to percpu rwsem (Oleg Nesterov)
- superblock scalability (Josef Bacik and Dave Chinner)
- swapon(2) race fix (Hugh Dickins)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
vfs: Test for and handle paths that are unreachable from their mnt_root
dcache: Reduce the scope of i_lock in d_splice_alias
dcache: Handle escaped paths in prepend_path
mm: fix potential data race in SyS_swapon
inode: don't softlockup when evicting inodes
inode: rename i_wb_list to i_io_list
sync: serialise per-superblock sync operations
inode: convert inode_sb_list_lock to per-sb
inode: add hlist_fake to avoid the inode hash lock in evict
writeback: plug writeback at a high level
change sb_writers to use percpu_rw_semaphore
shift percpu_counter_destroy() into destroy_super_work()
percpu-rwsem: kill CONFIG_PERCPU_RWSEM
percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
percpu-rwsem: introduce percpu_down_read_trylock()
document rwsem_release() in sb_wait_write()
fix the broken lockdep logic in __sb_start_write()
introduce __sb_writers_{acquired,release}() helpers
ufs_inode_get{frag,block}(): get rid of 'phys' argument
ufs_getfrag_block(): tidy up a bit
...
Merge tag 'for-linus-4.3-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p updates from Eric Van Hensbergen:
"Just a few cleanups for 4.3 merge window for the 9p file system. I've
gotten several more over the past week, but this group has been in
for-next for at least a couple of weeks so I figured I'd push them
first while I test the rest.
Most of the ones not in this set are bug-fixes anyways so I could hold
them for rc1"
* tag 'for-linus-4.3-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix return code of read() when count is 0
9p: remove unused option Opt_trans
Merge tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- new DVB frontend drivers: ascot2e, cxd2841er, horus3a, lnbh25
- new HDMI capture driver: tc358743
- new driver for NetUP DVB new boards (netup_unidvb)
- IR support for DVBSky cards (smipcie-ir)
- Coda driver has gain macroblock tiling support
- Renesas R-Car gains JPEG codec driver
- new DVB platform driver for STi boards: c8sectpfe
- added documentation for the media core kABI to device-drivers DocBook
- lots of driver fixups, cleanups and improvements
* tag 'media/v4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (297 commits)
[media] c8sectpfe: Remove select on undefined LIBELF_32
[media] i2c: fix platform_no_drv_owner.cocci warnings
[media] cx231xx: Use wake_up_interruptible() instead of wake_up_interruptible_nr()
[media] tc358743: only queue subdev notifications if devnode is set
[media] tc358743: add missing Kconfig dependency/select
[media] c8sectpfe: Use %pad to print 'dma_addr_t'
[media] DocBook media: Fix typo "the the" in xml files
[media] tc358743: make reset gpio optional
[media] tc358743: set direction of reset gpio using devm_gpiod_get
[media] dvbdev: document most of the functions/data structs
[media] dvb_frontend.h: document the struct dvb_frontend
[media] dvb-frontend.h: document struct dtv_frontend_properties
[media] dvb-frontend.h: document struct dvb_frontend_ops
[media] dvb: Use DVBFE_ALGO_HW where applicable
[media] dvb_frontend.h: document struct analog_demod_ops
[media] dvb_frontend.h: Document struct dvb_tuner_ops
[media] Docbook: Document struct analog_parameters
[media] dvb_frontend.h: get rid of dvbfe_modcod
[media] add documentation for struct dvb_tuner_info
[media] dvb_frontend: document dvb_frontend_tune_settings
...
Merge branch 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
"Mainly we move from jiffy based timer to HRTIMER for finer control
over polling. Then a controller reduces its polling period from 10 to
1ms"
* 'mailbox-for-next' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: arm_mhu: reduce txpoll_period from 10ms to 1 ms
mailbox: switch to hrtimer for tx_complete polling
mailbox: Drop owner assignment from platform_driver
- an assortment of little fixes, several for minor races only likely to
be hit during testing
- further cluster-md-raid1 development, not ready for real use yet.
- new RAID6 syndrome code for ARM NEON
- fix a race where a write can return before failure of one device is
properly recorded in metadata, so an immediate crash might result in
that write being lost.
* tag 'md/4.3' of git://neil.brown.name/md: (33 commits)
md/raid5: ensure device failure recorded before write request returns.
md/raid5: use bio_list for the list of bios to return.
md/raid10: ensure device failure recorded before write request returns.
md/raid1: ensure device failure recorded before write request returns.
md-cluster: remove inappropriate try_module_get from join()
md: extend spinlock protection in register_md_cluster_operations
md-cluster: Read the disk bitmap sb and check if it needs recovery
md-cluster: only call complete(&cinfo->completion) when node join cluster
md-cluster: add missed lockres_free
md-cluster: remove the unused sb_lock
md-cluster: init suspend_list and suspend_lock early in join
md-cluster: add the error check if failed to get dlm lock
md-cluster: init completion within lockres_init
md-cluster: fix deadlock issue on message lock
md-cluster: transfer the resync ownership to another node
md-cluster: split recover_slot for future code reuse
md-cluster: use %pU to print UUIDs
md: setup safemode_timer before it's being used
md/raid5: handle possible race as reshape completes.
md: sync sync_completed has correct value as recovery finishes.
...
Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Nothing major, but:
- Add Jeff Layton as an nfsd co-maintainer: no change to existing
practice, just an acknowledgement of the status quo.
- Two patches ("nfsd: ensure that...") for a race overlooked by the
state locking rewrite, causing a crash noticed by multiple users.
- Lots of smaller bugfixes all over from Kinglong Mee.
- From Jeff, some cleanup of server rpc code in preparation for
possible shift of nfsd threads to workqueues"
* tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits)
nfsd: deal with DELEGRETURN racing with CB_RECALL
nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case
nfsd: ensure that delegation stateid hash references are only put once
nfsd: ensure that the ol stateid hash reference is only put once
net: sunrpc: fix tracepoint Warning: unknown op '->'
nfsd: allow more than one laundry job to run at a time
nfsd: don't WARN/backtrace for invalid container deployment.
fs: fix fs/locks.c kernel-doc warning
nfsd: Add Jeff Layton as co-maintainer
NFSD: Return word2 bitmask if setting security label in OPEN/CREATE
NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1
nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL.
nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug
NFSD: Store parent's stat in a separate value
nfsd: Fix two typos in comments
lockd: NLM grace period shouldn't block NFSv4 opens
nfsd: include linux/nfs4.h in export.h
sunrpc: Switch to using hash list instead single list
sunrpc/nfsd: Remove redundant code by exports seq_operations functions
sunrpc: Store cache_detail in seq_file's private directly
...
Merge branch 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
"This has Jeff Mahoney's long standing trim patch that fixes corners
where trims were missing. Omar has some raid5/6 fixes, especially for
using scrub and device replace when devices are missing.
Zhao Lie continues cleaning and fixing things, this series fixes some
really hard to hit corners in xfstests. I had to pull it last merge
window due to some deadlocks, but those are now resolved.
I added support for Tejun's new blkio controllers. It seems to work
well for single devices, we'll expand to multi-device as well"
* 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (47 commits)
btrfs: fix compile when block cgroups are not enabled
Btrfs: fix file read corruption after extent cloning and fsync
Btrfs: check if previous transaction aborted to avoid fs corruption
btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
btrfs: Prevent from early transaction abort
btrfs: Remove unused arguments in tree-log.c
btrfs: Remove useless condition in start_log_trans()
Btrfs: add support for blkio controllers
Btrfs: remove unused mutex from struct 'btrfs_fs_info'
Btrfs: fix parity scrub of RAID 5/6 with missing device
Btrfs: fix device replace of a missing RAID 5/6 device
Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
Btrfs: count devices correctly in readahead during RAID 5/6 replace
Btrfs: remove misleading handling of missing device scrub
btrfs: fix clone / extent-same deadlocks
Btrfs: fix defrag to merge tail file extent
Btrfs: fix warning in backref walking
btrfs: Add WARN_ON() for double lock in btrfs_tree_lock()
btrfs: Remove root argument in extent_data_ref_count()
btrfs: Fix wrong comment of btrfs_alloc_tree_block()
...
- some of MM. Includes Andrea's userfaultfd feature.
[ Hadn't noticed that userfaultfd was 'default y' when applying the
patches, so that got fixed in this merge instead. We do _not_ mark
new features that nobody uses yet 'default y' - Linus ]
* emailed patches from Andrew Morton <[email protected]>: (118 commits)
mm/hugetlb.c: make vma_has_reserves() return bool
mm/madvise.c: make madvise_behaviour_valid() return bool
mm/memory.c: make tlb_next_batch() return bool
mm/dmapool.c: change is_page_busy() return from int to bool
mm: remove struct node_active_region
mremap: simplify the "overlap" check in mremap_to()
mremap: don't do uneccesary checks if new_len == old_len
mremap: don't do mm_populate(new_addr) on failure
mm: move ->mremap() from file_operations to vm_operations_struct
mremap: don't leak new_vma if f_op->mremap() fails
mm/hugetlb.c: make vma_shareable() return bool
mm: make GUP handle pfn mapping unless FOLL_GET is requested
mm: fix status code which move_pages() returns for zero page
mm: memcontrol: bring back the VM_BUG_ON() in mem_cgroup_swapout()
genalloc: add support of multiple gen_pools per device
genalloc: add name arg to gen_pool_get() and devm_gen_pool_create()
mm/memblock: WARN_ON when nid differs from overlap region
Documentation/features/vm: add feature description and arch support status for batched TLB flush after unmap
mm: defer flush of writable TLB entries
mm: send one IPI per CPU to TLB flush all entries after unmapping pages
...
Eric Dumazet [Sat, 29 Aug 2015 02:42:30 +0000 (19:42 -0700)]
task_work: remove fifo ordering guarantee
In commit f341861fb0b ("task_work: add a scheduling point in
task_work_run()") I fixed a latency problem adding a cond_resched()
call.
Later, commit ac3d0da8f329 added yet another loop to reverse a list,
bringing back the latency spike :
I've seen in some cases this loop taking 275 ms, if for example a
process with 2,000,000 files is killed.
We could add yet another cond_resched() in the reverse loop, or we
can simply remove the reversal, as I do not think anything
would depend on order of task_work_add() submitted works.
Keerthy [Tue, 18 Aug 2015 09:41:14 +0000 (15:11 +0530)]
ARM: dts: AM437x: Add the internal and external clock nodes for rtc
rtc can either be supplied from internal 32k clock or external crystal
generated 32k clock. Internal clock is SOC specific and the external
clock is board dependent. Adding the corresponding nodes.
Joonyoung Shim [Fri, 21 Aug 2015 09:43:41 +0000 (18:43 +0900)]
rtc: s5m: fix to update ctrl register
According to datasheet, the S2MPS13X and S2MPS14X should update write
buffer via setting WUDR bit to high after ctrl register is written.
If not, ALARM interrupt of rtc-s5m doesn't happen first time when i use
tools/testing/selftests/timers/rtctest.c test program and hour format is
used to 12 hour mode in Odroid-XU3 board.
One more issue is the RTC doesn't keep time on Odroid-XU3 board when i
turn on board after power off even if RTC battery is connected. It can
be solved as setting WUDR & RUDR bits to high at the same time after
RTC_CTRL register is written. It's same with condition of only writing
ALARM registers, so this is for only S2MPS14 and we should set WUDR &
A_UDR bits to high on S2MPS13.
I can't find any reasonable description about this like fix from
datasheet, but can find similar codes from rtc driver source of
hardkernel kernel and vendor kernel.
Rob Herring [Mon, 1 Jun 2015 12:53:01 +0000 (07:53 -0500)]
ARM: config: Switch PXA27x platforms to use PXA RTC driver
With the SA1100 and PXA RTC drivers be mutually exclusive and no
longer sharing hardware, PXA27x/PXA3xx platforms must use the PXA RTC
driver as the SA1100 platform device is no longer registered.
This change should be almost transparent to userspace. Former users of
pxa-rtc should be aware that 2 RTCs will be available on their kernels,
rtc0 being sa1100-rtc and rtc1 being pxa-rtc. Any userspace relying on
the fact that rtc0 was pxa-rtc should be fixed.
As a consequence:
- the first reboot after the switch will have the wrong time,
- on dual boot platform where the other OS programs some logic into the
sa1100 rtc IP, a lack of fix in userspace, ie. a kernel changing
sa1100-rtc thinking it is pxa-rtc could have dire consequence, such
as wiping the other OS data partition.
(Thanks to Robert Jarmik for help on the above commit text.)
Rob Herring [Tue, 3 Feb 2015 20:44:51 +0000 (14:44 -0600)]
rtc: sa1100/pxa: convert to run-time register mapping
SA1100 and PXA differ only in register offsets which are currently
hardcoded in a machine specific header. Some arm64 platforms (PXA1928)
have this RTC block as well (and not the PXA270 variant).
Convert the driver to use ioremap and set the register offsets dynamically.
Since we are touching all the register accesses, convert them all to
readl_relaxed/writel_relaxed.
Rob Herring [Mon, 2 Feb 2015 23:50:32 +0000 (17:50 -0600)]
ARM: pxa: add memory resource to SA1100 RTC device
The drivers for the SA1100 and PXA RTCs are now mutually exclusive, so
add the memory resource for the sa1100-rtc device. Since the memory
resource is already present in the pxa_rtc_resources, that makes
sa1100_rtc_resources and pxa_rtc_resources equivalent, so use
pxa_rtc_resources for both devices and remove the duplicate
sa1100_rtc_resources.
Rob Herring [Wed, 13 May 2015 14:20:04 +0000 (09:20 -0500)]
rtc: pxa: convert to use shared sa1100 functions
Currently, the rtc-sa1100 and rtc-pxa drivers co-exist as rtc-pxa has a
superset of functionality. Having 2 drivers sharing the same memory
resource is not allowed by the driver model if resources are properly
declared. This problem was avoided by not adding memory resources to the
SA1100 RTC driver, but that prevents clean-up of the SA1100 driver.
This commit converts the PXA RTC to use the exported SA1100 RTC
functions. Now the sa1100-rtc and pxa-rtc devices are mutually
exclusive, so we must remove the sa1100-rtc from pxa27x and pxa3xx.
Rob Herring [Tue, 12 May 2015 21:23:23 +0000 (16:23 -0500)]
rtc: sa1100: prepare to share sa1100_rtc_ops
Factor out the RTC initialization from the platform device specific
parts in order to share the RTC device ops with other drivers.
Specifically, it will be shared with rtc-pxa driver.
Wang Dongsheng [Wed, 12 Aug 2015 09:14:13 +0000 (17:14 +0800)]
rtc: ds3232: fix WARNING trace in resume function
If ds3232 work on some platform that is not implementing
irq_set_wake, ds3232 will get a WARNING trace in resume.
So fix ds3232->suspended state to false when irq_set_irq_wake
return error.
WARNING: CPU: 0 PID: 729 at kernel/irq/manage.c:604 irq_set_irq_wake+0x4b/0x8c()
Unbalanced IRQ 201 wake disable
Modules linked in:
CPU: 0 PID: 729 Comm: sh Not tainted 3.12.19-rt30+ #25
[<800107d9>] (unwind_backtrace+0x1/0x88) from [<8000e4ef>] (show_stack+0xb/0xc)
[<8000e4ef>] (show_stack+0xb/0xc) from [<802b5fa9>] (dump_stack+0x4d/0x60)
[<802b5fa9>] (dump_stack+0x4d/0x60) from [<800186dd>] (warn_slowpath_common+0x45/0x64)
[<800186dd>] (warn_slowpath_common+0x45/0x64) from [<80018717>] (warn_slowpath_fmt+0x1b/0x24)
[<80018717>] (warn_slowpath_fmt+0x1b/0x24) from [<8003a8d3>] (irq_set_irq_wake+0x4b/0x8c)
[<8003a8d3>] (irq_set_irq_wake+0x4b/0x8c) from [<80204fcb>] (ds3232_resume+0x2d/0x36)
[<80204fcb>] (ds3232_resume+0x2d/0x36) from [<801954c7>] (dpm_run_callback.isra.13+0xb/0x28)
[<801954c7>] (dpm_run_callback.isra.13+0xb/0x28) from [<80195b1b>] (device_resume+0x7b/0xa2)
[<80195b1b>] (device_resume+0x7b/0xa2) from [<80195f0f>] (dpm_resume+0xbb/0x19c)
[<80195f0f>] (dpm_resume+0xbb/0x19c) from [<801960d9>] (dpm_resume_end+0x9/0x12)
[<801960d9>] (dpm_resume_end+0x9/0x12) from [<80037e1d>] (suspend_devices_and_enter+0x17d/0x1d0)
[<80037e1d>] (suspend_devices_and_enter+0x17d/0x1d0) from [<80037ee1>] (pm_suspend+0x71/0x128)
[<80037ee1>] (pm_suspend+0x71/0x128) from [<80037449>] (state_store+0x6d/0x80)
[<80037449>] (state_store+0x6d/0x80) from [<800af4d5>] (sysfs_write_file+0x9f/0xde)
[<800af4d5>] (sysfs_write_file+0x9f/0xde) from [<8007a437>] (vfs_write+0x7b/0x104)
[<8007a437>] (vfs_write+0x7b/0x104) from [<8007a7f7>] (SyS_write+0x27/0x48)
[<8007a7f7>] (SyS_write+0x27/0x48) from [<8000c121>] (ret_fast_syscall+0x1/0x44)
Joonyoung Shim [Tue, 11 Aug 2015 11:28:19 +0000 (20:28 +0900)]
rtc: s3c: add missing clk control
It's missed to call clk_unprepare() about info->rtc_src_clk in
s3c_rtc_remove and to call clk_disable_unprepare about info->rtc_clk in
error routine of s3c_rtc_probe.
Joonyoung Shim [Wed, 12 Aug 2015 10:21:46 +0000 (19:21 +0900)]
rtc: s3c: fix disabled clocks for alarm
The clock enable/disable codes for alarm have been removed from
commit 24e1455493da ("drivers/rtc/rtc-s3c.c: delete duplicate clock
control") and the clocks are disabled even if alarm is set, so alarm
interrupt can't happen.
The s3c_rtc_setaie function can be called several times with 'enabled'
argument having same value, so it needs to check whether clocks are
enabled or not.
Nadav Haklai [Thu, 6 Aug 2015 15:18:48 +0000 (17:18 +0200)]
rtc: armada38x: Align RTC set time procedure with the official errata
According to the Armada38x functional errata FE-3124064, writing to
the RTC TIME register may fail. As a workaround, after writing to RTC
TIME register, issue a dummy write of 0x0 twice to the RTC Status
register. This is the updated implementation of the Errata that
eliminates the need of the long 100ms delay during the RTC set time
procedure.