]>
Commit | Line | Data |
---|---|---|
cc3d15a5 CH |
1 | Booting from real channel-attached devices on s390x |
2 | =================================================== | |
3 | ||
4 | s390 hardware IPL | |
5 | ----------------- | |
efa47d36 JH |
6 | |
7 | The s390 hardware IPL process consists of the following steps. | |
8 | ||
cc3d15a5 CH |
9 | 1. A READ IPL ccw is constructed in memory location ``0x0``. |
10 | This ccw, by definition, reads the IPL1 record which is located on the disk | |
11 | at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw | |
12 | so when it is complete another ccw will be fetched and executed from memory | |
13 | location ``0x08``. | |
14 | ||
15 | 2. Execute the Read IPL ccw at ``0x00``, thereby reading IPL1 data into ``0x00``. | |
16 | IPL1 data is 24 bytes in length and consists of the following pieces of | |
17 | information: ``[psw][read ccw][tic ccw]``. When the machine executes the Read | |
18 | IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at | |
19 | location ``0x0``. Then the ccw program at ``0x08`` which consists of a read | |
20 | ccw and a tic ccw is automatically executed because of the chain flag from | |
21 | the original READ IPL ccw. The read ccw will read the IPL2 data into memory | |
22 | and the TIC (Transfer In Channel) will transfer control to the channel | |
23 | program contained in the IPL2 data. The TIC channel command is the | |
24 | equivalent of a branch/jump/goto instruction for channel programs. | |
25 | ||
26 | NOTE: The ccws in IPL1 are defined by the architecture to be format 0. | |
efa47d36 JH |
27 | |
28 | 3. Execute IPL2. | |
cc3d15a5 CH |
29 | The TIC ccw instruction at the end of the IPL1 channel program will begin |
30 | the execution of the IPL2 channel program. IPL2 is stage-2 of the boot | |
31 | process and will contain a larger channel program than IPL1. The point of | |
32 | IPL2 is to find and load either the operating system or a small program that | |
33 | loads the operating system from disk. At the end of this step all or some of | |
34 | the real operating system is loaded into memory and we are ready to hand | |
35 | control over to the guest operating system. At this point the guest | |
36 | operating system is entirely responsible for loading any more data it might | |
37 | need to function. | |
38 | ||
39 | NOTE: The IPL2 channel program might read data into memory | |
40 | location ``0x0`` thereby overwriting the IPL1 psw and channel program. This is ok | |
41 | as long as the data placed in location ``0x0`` contains a psw whose instruction | |
42 | address points to the guest operating system code to execute at the end of | |
43 | the IPL/boot process. | |
44 | ||
45 | NOTE: The ccws in IPL2 are defined by the architecture to be format 0. | |
efa47d36 JH |
46 | |
47 | 4. Start executing the guest operating system. | |
cc3d15a5 CH |
48 | The psw that was loaded into memory location ``0x0`` as part of the ipl process |
49 | should contain the needed flags for the operating system we have loaded. The | |
50 | psw's instruction address will point to the location in memory where we want | |
51 | to start executing the operating system. This psw is loaded (via LPSW | |
52 | instruction) causing control to be passed to the operating system code. | |
efa47d36 JH |
53 | |
54 | In a non-virtualized environment this process, handled entirely by the hardware, | |
55 | is kicked off by the user initiating a "Load" procedure from the hardware | |
56 | management console. This "Load" procedure crafts a special "Read IPL" ccw in | |
57 | memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking | |
58 | off the reading of IPL1 data. Since the channel program from IPL1 will be | |
59 | written immediately after the special "Read IPL" ccw, the IPL1 channel program | |
60 | will be executed immediately (the special read ccw has the chaining bit turned | |
61 | on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel | |
62 | program to be executed automatically. After this sequence completes the "Load" | |
cc3d15a5 | 63 | procedure then loads the psw from ``0x0``. |
efa47d36 | 64 | |
cc3d15a5 CH |
65 | How this all pertains to QEMU (and the kernel) |
66 | ---------------------------------------------- | |
efa47d36 JH |
67 | |
68 | In theory we should merely have to do the following to IPL/boot a guest | |
69 | operating system from a DASD device: | |
70 | ||
cc3d15a5 CH |
71 | 1. Place a "Read IPL" ccw into memory location ``0x0`` with chaining bit on. |
72 | 2. Execute channel program at ``0x0``. | |
73 | 3. LPSW ``0x0``. | |
efa47d36 JH |
74 | |
75 | However, our emulation of the machine's channel program logic within the kernel | |
76 | is missing one key feature that is required for this process to work: | |
77 | non-prefetch of ccw data. | |
78 | ||
79 | When we start a channel program we pass the channel subsystem parameters via an | |
80 | ORB (Operation Request Block). One of those parameters is a prefetch bit. If the | |
81 | bit is on then the vfio-ccw kernel driver is allowed to read the entire channel | |
82 | program from guest memory before it starts executing it. This means that any | |
83 | channel commands that read additional channel commands will not work as expected | |
84 | because the newly read commands will only exist in guest memory and NOT within | |
85 | the kernel's channel subsystem memory. The kernel vfio-ccw driver currently | |
86 | requires this bit to be on for all channel programs. This is a problem because | |
87 | the IPL process consists of transferring control from the "Read IPL" ccw | |
88 | immediately to the IPL1 channel program that was read by "Read IPL". | |
89 | ||
90 | Not being able to turn off prefetch will also prevent the TIC at the end of the | |
91 | IPL1 channel program from transferring control to the IPL2 channel program. | |
92 | ||
93 | Lastly, in some cases (the zipl bootloader for example) the IPL2 program also | |
94 | transfers control to another channel program segment immediately after reading | |
95 | it from the disk. So we need to be able to handle this case. | |
96 | ||
cc3d15a5 CH |
97 | What QEMU does |
98 | -------------- | |
efa47d36 JH |
99 | |
100 | Since we are forced to live with prefetch we cannot use the very simple IPL | |
101 | procedure we defined in the preceding section. So we compensate by doing the | |
102 | following. | |
103 | ||
cc3d15a5 CH |
104 | 1. Place "Read IPL" ccw into memory location ``0x0``, but turn off chaining bit. |
105 | 2. Execute "Read IPL" at ``0x0``. | |
efa47d36 | 106 | |
cc3d15a5 | 107 | So now IPL1's psw is at ``0x0`` and IPL1's channel program is at ``0x08``. |
efa47d36 | 108 | |
cc3d15a5 | 109 | 3. Write a custom channel program that will seek to the IPL2 record and then |
efa47d36 JH |
110 | execute the READ and TIC ccws from IPL1. Normally the seek is not required |
111 | because after reading the IPL1 record the disk is automatically positioned | |
112 | to read the very next record which will be IPL2. But since we are not reading | |
113 | both IPL1 and IPL2 as part of the same channel program we must manually set | |
114 | the position. | |
115 | ||
cc3d15a5 | 116 | 4. Grab the target address of the TIC instruction from the IPL1 channel program. |
efa47d36 JH |
117 | This address is where the IPL2 channel program starts. |
118 | ||
119 | Now IPL2 is loaded into memory somewhere, and we know the address. | |
120 | ||
cc3d15a5 | 121 | 5. Execute the IPL2 channel program at the address obtained in step #4. |
efa47d36 JH |
122 | |
123 | Because this channel program can be dynamic, we must use a special algorithm | |
124 | that detects a READ immediately followed by a TIC and breaks the ccw chain | |
125 | by turning off the chain bit in the READ ccw. When control is returned from | |
126 | the kernel/hardware to the QEMU bios code we immediately issue another start | |
127 | subchannel to execute the remaining TIC instruction. This causes the entire | |
128 | channel program (starting from the TIC) and all needed data to be refetched | |
129 | thereby stepping around the limitation that would otherwise prevent this | |
130 | channel program from executing properly. | |
131 | ||
132 | Now the operating system code is loaded somewhere in guest memory and the psw | |
cc3d15a5 | 133 | in memory location ``0x0`` will point to entry code for the guest operating |
efa47d36 JH |
134 | system. |
135 | ||
cc3d15a5 CH |
136 | 6. LPSW ``0x0`` |
137 | ||
efa47d36 | 138 | LPSW transfers control to the guest operating system and we're done. |