[linux.git] / Documentation / bus-virt-phys-mapping.txt

==========================================================
How to access I/O mapped memory from within device drivers
==========================================================

:Author: Linus

.. warning::

	The virt_to_bus() and bus_to_virt() functions have been
	superseded by the functionality provided by the PCI DMA interface
	(see Documentation/DMA-API-HOWTO.txt).  They continue
	to be documented below for historical purposes, but new code
	must not use them. --davidm 00/12/12

::

  [ This is a mail message in response to a query on IO mapping, thus the
    strange format for a "document" ]

The AHA-1542 is a bus-master device, and your patch makes the driver give the
controller the physical address of the buffers, which is correct on x86
(because all bus master devices see the physical memory mappings directly). 

However, on many setups, there are actually **three** different ways of looking
at memory addresses, and in this case we actually want the third, the
so-called "bus address". 

Essentially, the three ways of addressing memory are (this is "real memory",
that is, normal RAM--see later about other details): 

 - CPU untranslated.  This is the "physical" address.  Physical address 
   0 is what the CPU sees when it drives zeroes on the memory bus.

 - CPU translated address. This is the "virtual" address, and is 
   completely internal to the CPU itself with the CPU doing the appropriate
   translations into "CPU untranslated". 

 - bus address. This is the address of memory as seen by OTHER devices, 
   not the CPU. Now, in theory there could be many different bus 
   addresses, with each device seeing memory in some device-specific way, but
   happily most hardware designers aren't actually actively trying to make
   things any more complex than necessary, so you can assume that all 
   external hardware sees the memory the same way. 

Now, on normal PCs the bus address is exactly the same as the physical
address, and things are very simple indeed. However, they are that simple
because the memory and the devices share the same address space, and that is
not generally necessarily true on other PCI/ISA setups. 

Now, just as an example, on the PReP (PowerPC Reference Platform), the 
CPU sees a memory map something like this (this is from memory)::

	0-2 GB		"real memory"
	2 GB-3 GB	"system IO" (inb/out and similar accesses on x86)
	3 GB-4 GB 	"IO memory" (shared memory over the IO bus)

Now, that looks simple enough. However, when you look at the same thing from
the viewpoint of the devices, you have the reverse, and the physical memory
address 0 actually shows up as address 2 GB for any IO master.

So when the CPU wants any bus master to write to physical memory 0, it 
has to give the master address 0x80000000 as the memory address.

So, for example, depending on how the kernel is actually mapped on the 
PPC, you can end up with a setup like this::

 physical address:	0
 virtual address:	0xC0000000
 bus address:		0x80000000

where all the addresses actually point to the same thing.  It's just seen 
through different translations..

Similarly, on the Alpha, the normal translation is::

 physical address:	0
 virtual address:	0xfffffc0000000000
 bus address:		0x40000000

(but there are also Alphas where the physical address and the bus address
are the same). 

Anyway, the way to look up all these translations, you do::

	#include <asm/io.h>

	phys_addr = virt_to_phys(virt_addr);
	virt_addr = phys_to_virt(phys_addr);
	 bus_addr = virt_to_bus(virt_addr);
	virt_addr = bus_to_virt(bus_addr);

Now, when do you need these?

You want the **virtual** address when you are actually going to access that
pointer from the kernel. So you can have something like this::

	/*
	 * this is the hardware "mailbox" we use to communicate with
	 * the controller. The controller sees this directly.
	 */
	struct mailbox {
		__u32 status;
		__u32 bufstart;
		__u32 buflen;
		..
	} mbox;

		unsigned char * retbuffer;

		/* get the address from the controller */
		retbuffer = bus_to_virt(mbox.bufstart);
		switch (retbuffer[0]) {
			case STATUS_OK:
				...

on the other hand, you want the bus address when you have a buffer that 
you want to give to the controller::

	/* ask the controller to read the sense status into "sense_buffer" */
	mbox.bufstart = virt_to_bus(&sense_buffer);
	mbox.buflen = sizeof(sense_buffer);
	mbox.status = 0;
	notify_controller(&mbox);

And you generally **never** want to use the physical address, because you can't
use that from the CPU (the CPU only uses translated virtual addresses), and
you can't use it from the bus master. 

So why do we care about the physical address at all? We do need the physical
address in some cases, it's just not very often in normal code.  The physical
address is needed if you use memory mappings, for example, because the
"remap_pfn_range()" mm function wants the physical address of the memory to
be remapped as measured in units of pages, a.k.a. the pfn (the memory
management layer doesn't know about devices outside the CPU, so it
shouldn't need to know about "bus addresses" etc).

.. note::

	The above is only one part of the whole equation. The above
	only talks about "real memory", that is, CPU memory (RAM).

There is a completely different type of memory too, and that's the "shared
memory" on the PCI or ISA bus. That's generally not RAM (although in the case
of a video graphics card it can be normal DRAM that is just used for a frame
buffer), but can be things like a packet buffer in a network card etc. 

This memory is called "PCI memory" or "shared memory" or "IO memory" or
whatever, and there is only one way to access it: the readb/writeb and
related functions. You should never take the address of such memory, because
there is really nothing you can do with such an address: it's not
conceptually in the same memory space as "real memory" at all, so you cannot
just dereference a pointer. (Sadly, on x86 it **is** in the same memory space,
so on x86 it actually works to just deference a pointer, but it's not
portable). 

For such memory, you can do things like:

 - reading::

	/*
	 * read first 32 bits from ISA memory at 0xC0000, aka
	 * C000:0000 in DOS terms
	 */
	unsigned int signature = isa_readl(0xC0000);

 - remapping and writing::

	/*
	 * remap framebuffer PCI memory area at 0xFC000000,
	 * size 1MB, so that we can access it: We can directly
	 * access only the 640k-1MB area, so anything else
	 * has to be remapped.
	 */
	void __iomem *baseptr = ioremap(0xFC000000, 1024*1024);

	/* write a 'A' to the offset 10 of the area */
	writeb('A',baseptr+10);

	/* unmap when we unload the driver */
	iounmap(baseptr);

 - copying and clearing::

	/* get the 6-byte Ethernet address at ISA address E000:0040 */
	memcpy_fromio(kernel_buffer, 0xE0040, 6);
	/* write a packet to the driver */
	memcpy_toio(0xE1000, skb->data, skb->len);
	/* clear the frame buffer */
	memset_io(0xA0000, 0, 0x10000);

OK, that just about covers the basics of accessing IO portably.  Questions?
Comments? You may think that all the above is overly complex, but one day you
might find yourself with a 500 MHz Alpha in front of you, and then you'll be
happy that your driver works ;)

Note that kernel versions 2.0.x (and earlier) mistakenly called the
ioremap() function "vremap()".  ioremap() is the proper name, but I
didn't think straight when I wrote it originally.  People who have to
support both can do something like::
 
	/* support old naming silliness */
	#if LINUX_VERSION_CODE < 0x020100
	#define ioremap vremap
	#define iounmap vfree                                                     
	#endif
 
at the top of their source files, and then they can use the right names
even on 2.0.x systems. 

And the above sounds worse than it really is.  Most real drivers really
don't do all that complex things (or rather: the complexity is not so
much in the actual IO accesses as in error handling and timeouts etc). 
It's generally not hard to fix drivers, and in many cases the code
actually looks better afterwards::

	unsigned long signature = *(unsigned int *) 0xC0000;
		vs
	unsigned long signature = readl(0xC0000);

I think the second version actually is more readable, no?
Commit	Line	Data
38975e90 MCC	1	==========================================================
	2	How to access I/O mapped memory from within device drivers
	3	==========================================================
	4
	5	:Author: Linus
	6
	7	.. warning::
	8
	9	The virt_to_bus() and bus_to_virt() functions have been
5872fb94	10	superseded by the functionality provided by the PCI DMA interface
395cf969	11	(see Documentation/DMA-API-HOWTO.txt). They continue
1da177e4	12	to be documented below for historical purposes, but new code
38975e90	13	must not use them. --davidm 00/12/12
1da177e4	14
38975e90 MCC	15	::
	16
	17	[ This is a mail message in response to a query on IO mapping, thus the
	18	strange format for a "document" ]
1da177e4 LT	19
	20	The AHA-1542 is a bus-master device, and your patch makes the driver give the
	21	controller the physical address of the buffers, which is correct on x86
	22	(because all bus master devices see the physical memory mappings directly).
	23
38975e90	24	However, on many setups, there are actually three different ways of looking
1da177e4 LT	25	at memory addresses, and in this case we actually want the third, the
	26	so-called "bus address".
	27
	28	Essentially, the three ways of addressing memory are (this is "real memory",
	29	that is, normal RAM--see later about other details):
	30
	31	- CPU untranslated. This is the "physical" address. Physical address
	32	0 is what the CPU sees when it drives zeroes on the memory bus.
	33
	34	- CPU translated address. This is the "virtual" address, and is
	35	completely internal to the CPU itself with the CPU doing the appropriate
	36	translations into "CPU untranslated".
	37
	38	- bus address. This is the address of memory as seen by OTHER devices,
	39	not the CPU. Now, in theory there could be many different bus
	40	addresses, with each device seeing memory in some device-specific way, but
	41	happily most hardware designers aren't actually actively trying to make
	42	things any more complex than necessary, so you can assume that all
	43	external hardware sees the memory the same way.
	44
	45	Now, on normal PCs the bus address is exactly the same as the physical
	46	address, and things are very simple indeed. However, they are that simple
	47	because the memory and the devices share the same address space, and that is
	48	not generally necessarily true on other PCI/ISA setups.
	49
	50	Now, just as an example, on the PReP (PowerPC Reference Platform), the
38975e90	51	CPU sees a memory map something like this (this is from memory)::
1da177e4 LT	52
	53	0-2 GB "real memory"
	54	2 GB-3 GB "system IO" (inb/out and similar accesses on x86)
	55	3 GB-4 GB "IO memory" (shared memory over the IO bus)
	56
	57	Now, that looks simple enough. However, when you look at the same thing from
	58	the viewpoint of the devices, you have the reverse, and the physical memory
	59	address 0 actually shows up as address 2 GB for any IO master.
	60
	61	So when the CPU wants any bus master to write to physical memory 0, it
	62	has to give the master address 0x80000000 as the memory address.
	63
	64	So, for example, depending on how the kernel is actually mapped on the
38975e90	65	PPC, you can end up with a setup like this::
1da177e4 LT	66
	67	physical address: 0
	68	virtual address: 0xC0000000
	69	bus address: 0x80000000
	70
	71	where all the addresses actually point to the same thing. It's just seen
	72	through different translations..
	73
38975e90	74	Similarly, on the Alpha, the normal translation is::
1da177e4 LT	75
	76	physical address: 0
	77	virtual address: 0xfffffc0000000000
	78	bus address: 0x40000000
	79
	80	(but there are also Alphas where the physical address and the bus address
	81	are the same).
	82
38975e90	83	Anyway, the way to look up all these translations, you do::
1da177e4 LT	84
	85	#include <asm/io.h>
	86
	87	phys_addr = virt_to_phys(virt_addr);
	88	virt_addr = phys_to_virt(phys_addr);
	89	bus_addr = virt_to_bus(virt_addr);
	90	virt_addr = bus_to_virt(bus_addr);
	91
	92	Now, when do you need these?
	93
38975e90 MCC	94	You want the virtual address when you are actually going to access that
38975e90 MCC	95	pointer from the kernel. So you can have something like this::
1da177e4 LT	96
	97	/*
	98	* this is the hardware "mailbox" we use to communicate with
	99	* the controller. The controller sees this directly.
	100	*/
	101	struct mailbox {
	102	__u32 status;
	103	__u32 bufstart;
	104	__u32 buflen;
	105	..
	106	} mbox;
	107
	108	unsigned char * retbuffer;
	109
	110	/* get the address from the controller */
	111	retbuffer = bus_to_virt(mbox.bufstart);
	112	switch (retbuffer[0]) {
	113	case STATUS_OK:
	114	...
	115
	116	on the other hand, you want the bus address when you have a buffer that
38975e90	117	you want to give to the controller::
1da177e4 LT	118
	119	/* ask the controller to read the sense status into "sense_buffer" */
	120	mbox.bufstart = virt_to_bus(&sense_buffer);
	121	mbox.buflen = sizeof(sense_buffer);
	122	mbox.status = 0;
	123	notify_controller(&mbox);
	124
38975e90	125	And you generally never want to use the physical address, because you can't
1da177e4 LT	126	use that from the CPU (the CPU only uses translated virtual addresses), and
	127	you can't use it from the bus master.
	128
	129	So why do we care about the physical address at all? We do need the physical
	130	address in some cases, it's just not very often in normal code. The physical
	131	address is needed if you use memory mappings, for example, because the
	132	"remap_pfn_range()" mm function wants the physical address of the memory to
	133	be remapped as measured in units of pages, a.k.a. the pfn (the memory
	134	management layer doesn't know about devices outside the CPU, so it
	135	shouldn't need to know about "bus addresses" etc).
	136
38975e90 MCC	137	.. note::
	138
	139	The above is only one part of the whole equation. The above
	140	only talks about "real memory", that is, CPU memory (RAM).
1da177e4 LT	141
	142	There is a completely different type of memory too, and that's the "shared
	143	memory" on the PCI or ISA bus. That's generally not RAM (although in the case
	144	of a video graphics card it can be normal DRAM that is just used for a frame
	145	buffer), but can be things like a packet buffer in a network card etc.
	146
	147	This memory is called "PCI memory" or "shared memory" or "IO memory" or
	148	whatever, and there is only one way to access it: the readb/writeb and
	149	related functions. You should never take the address of such memory, because
	150	there is really nothing you can do with such an address: it's not
	151	conceptually in the same memory space as "real memory" at all, so you cannot
38975e90	152	just dereference a pointer. (Sadly, on x86 it is in the same memory space,
1da177e4 LT	153	so on x86 it actually works to just deference a pointer, but it's not
	154	portable).
	155
38975e90 MCC	156	For such memory, you can do things like:
	157
	158	- reading::
1da177e4	159
1da177e4 LT	160	/*
	161	* read first 32 bits from ISA memory at 0xC0000, aka
	162	* C000:0000 in DOS terms
	163	*/
	164	unsigned int signature = isa_readl(0xC0000);
	165
38975e90 MCC	166	- remapping and writing::
38975e90 MCC	167
1da177e4 LT	168	/*
	169	* remap framebuffer PCI memory area at 0xFC000000,
	170	* size 1MB, so that we can access it: We can directly
	171	* access only the 640k-1MB area, so anything else
	172	* has to be remapped.
	173	*/
143724fd	174	void __iomem baseptr = ioremap(0xFC000000, 10241024);
1da177e4 LT	175
	176	/* write a 'A' to the offset 10 of the area */
	177	writeb('A',baseptr+10);
	178
	179	/* unmap when we unload the driver */
	180	iounmap(baseptr);
	181
38975e90 MCC	182	- copying and clearing::
38975e90 MCC	183
1da177e4 LT	184	/* get the 6-byte Ethernet address at ISA address E000:0040 */
	185	memcpy_fromio(kernel_buffer, 0xE0040, 6);
	186	/* write a packet to the driver */
	187	memcpy_toio(0xE1000, skb->data, skb->len);
	188	/* clear the frame buffer */
	189	memset_io(0xA0000, 0, 0x10000);
	190
	191	OK, that just about covers the basics of accessing IO portably. Questions?
	192	Comments? You may think that all the above is overly complex, but one day you
	193	might find yourself with a 500 MHz Alpha in front of you, and then you'll be
	194	happy that your driver works ;)
	195
	196	Note that kernel versions 2.0.x (and earlier) mistakenly called the
	197	ioremap() function "vremap()". ioremap() is the proper name, but I
	198	didn't think straight when I wrote it originally. People who have to
38975e90	199	support both can do something like::
1da177e4 LT	200
1da177e4 LT	201	/* support old naming silliness */
38975e90	202	#if LINUX_VERSION_CODE < 0x020100
1da177e4 LT	203	#define ioremap vremap
	204	#define iounmap vfree
	205	#endif
	206
	207	at the top of their source files, and then they can use the right names
	208	even on 2.0.x systems.
	209
	210	And the above sounds worse than it really is. Most real drivers really
	211	don't do all that complex things (or rather: the complexity is not so
	212	much in the actual IO accesses as in error handling and timeouts etc).
	213	It's generally not hard to fix drivers, and in many cases the code
38975e90	214	actually looks better afterwards::
1da177e4 LT	215
	216	unsigned long signature = (unsigned int ) 0xC0000;
	217	vs
	218	unsigned long signature = readl(0xC0000);
	219
	220	I think the second version actually is more readable, no?