ISA Hardware

8237 DMA Transfers Across Page Boundaries

IBM-Reference-Books-Banner

The 8237 DMA controller in the original PC/XT (and its clones) is fundamentally an 8-bit device with a 16-bit address space – perfectly matched to the MCS 85 family of which it was a part.  So to make it work with the 20-bit address space of the 8086 and 8088, IBM added a 4-bit ‘page register’ for each of its four DMA channels using a 74LS670 (a quad 4-bit register file).

The 8237 and the 74LS670 though are broadly independent; the page register does not automatically increment when the address register wraps around to zero.  This has two implications: normal segment:offset addresses must be converted to a linear, 20-bit physical address, and DMA transfers cannot cross a 64 KB page boundary.

Determining the Physical Buffer Address

segment-offset-to-linear-address-for-dma-controller

Code in the XTIDE Universal BIOS illustrates how to convert a standard segment:offset address (presented in ES:SI) to a linear address, with just the 8088 instruction set:

     xor        dx, dx      ; clear DX
     mov        ax, es      ; copy ES to AX
 %rep 4 
     shl        ax, 1       ; shift left 1, MSB into carry...
     rcl        dx, 1       ; ...and from carry to DX LSB
 %endrep                    ; repeat for the 4 MSB bits
                            ; AX now has ES SHL 4, and DX has ES SHR 12 
     add        si, ax      ; add AX to SI, to get low 16-bits in SI
     adc        dl, dh      ; if it overflowed, increment DX (DH is zero)
     mov        es, dx      ; and save DX back in ES

DX needs to end up with ES SHR 12 because IBM hooked up the 74LS670 DMA page register to the low four-bits of the data bus, so programming the high 4-bits of the physical address is achieved from the low 4-bits of a CPU register.  The addresses are then loaded into the DMA controller address register (in two halves, since the DMA controller has only an 8-bit data bus) and the associated page register.  In this example, the port addresses are for channel 3:

     out        0Ch, al                ; Reset flip-flop to low byte 
     mov        ax, es                 ; Get high 4 bits
     out        82h, al                ; Page register for Ch.3 
     mov        ax, si                 ; Get low 16 bits
     out        06h, al                ; Send low byte to Ch.3 address register
     mov        al, ah                 ; 
     out        06h, al                ; Send high byte to Ch.3 address register

Crossing a 64KB Boundary

Since the page register isn’t incremented by the DMA controller, a DMA transfer can run up to a page boundary at which point it (and the associated page register) must be re-programmed for another transfer into the next physical page.  Splitting a transfer across a boundary therefore requires a check of the transfer size against the possible number of bytes up to a page boundary.

The code that follows assumes the maximum total transfer size is less than 64KB so allows for either one or two DMA transfers.

    ; On entry - buffer is in ES:DI, CX has bytes to transfer
    ; First calculate bytes up to physical page boundary
    mov        ax, di 
    neg        ax                 ; 2s compliment

    ; if DI was zero, carry flag will be cleared (and set otherwise)
    ; When DI is zero only one transfer is required if total DMA
    ; transfer size is restricted to < 64KB
    jnc    .TransferDmaPageWithSizeInCX

    ; CF was set, so DI != 0 and we might need one or two transfers
    cmp        cx, ax                    ; if won't cross physical page boundary... 
    jbe    .TransferDmaPageWithSizeInCX  ; ...perform transfer in one operation 

    ; Calculate how much we can transfer on first and second rounds 
    xchg        cx, ax            ; CX = BYTEs for first page 
    sub         ax, cx            ; AX = BYTEs for second page 
    push        ax                ; Save bytes for second transfer on stack 

    ; Transfer first DMA page 
    call    StartDMAtransfer 
    pop         cx                ; Pop size for second DMA page 

.TransferDmaPageWithSizeInCX: 
    ; Fall to StartDMAtransfer 

StartDMAtransfer:
    ; DMA controller programming and transfer is completed here
    ; This code will be hardware dependent
    ; ...

    ; Once transfer is done, update physical address in ES:DI
    ; since IO might need several calls through this function
    ; (if crossing a physical page boundary)
    mov        ax, es             ; copy physical page address to ax 
    add        di, cx             ; add requested bytes to di 
    adc        al, 0              ; increment physical page address, if required 
    mov        es, ax             ; and save it back in es 

    ret

Purpose

The DMA controller in the original IBM PC really has a few reasons for being – RAM refresh of course, background transfers (as used by SoundBlaster sampled audio for example), and high-performance transfers.  The 8237 DMA controller is usually noted for its lack of performance, but that perception came about because CPU speed soon eclipsed it.

Operating in a 4.77MHz 8088, the DMA controller is the only way to transfer data to or from a peripheral in consecutive, back-to-back full-speed bus cycles (in later PCs, the DMA controller is throttled to about 5MHz to ensure peripheral compatibility).  Of course it’s made more difficult by the boundary crossing issues and requirement to pause for RAM refresh, but the controller can provide the fastest possible transfers as demonstrated by my XT-CFv3 DMA Transfer Mode.

Understanding the IBM PC/XT 5160 Slot 8

IBM-Reference-Books-Banner

When IBM released the PC/XT 5160, the 8 ISA slots were a welcome improvement from the 5 in the original IBM PC 5150; multi-function cards weren’t yet common and the system board itself provided no IO capabilities other than the speaker and keyboard, so everything needed an expansion card.

Whilst seven of the slots operated just like those in the 5150, the slot nearest the CPU was special – the IBM Technical Reference noting Slot J8 is slightly different from the others in that any card placed in it is expected to respond with a ‘card selected’ signal whenever the card is selected, the key being a previously unused signal line B8:

CARD SLCTD (I) Card Selected: This line is activated by cards in expansion slot J8. It signals the system board that the card has been selected and that appropriate drivers on the system board should be directed to either read from, or write to, expansion slot J8. Connectors J1 through J8 are tied together at this pin, but the system board does not use their signal. This line should be driven by an open collector device.

By observation, B8 needs to be asserted (low) only when reading from a device in slot 8, which then sets the direction of the buffer U15 to transfer data from the XD bus (housing slot J8, the system ROMs and the DMA controller) to the D bus (slots 1 to 7).

Open Collector Drive

For novices like me, the last part of IBM’s text is important: the open collector drive (see evilmadscientist.com for a good description of this).  As other logic can drive B8, such as the ROM address decoder, a device driving ISA B8 can’t present a high-level drive when not asserted (the signal level is pulled up through RN1).  This can be achieved either directly using open-collector logic, or by using a separate buffer.

In my CPLD logic for the XT-CFv3, the logic looks like this:

 ISA-B8-CPLD-Logic

In the Lo-tech CompactFlash Adapter, I’ve gone with a separate buffer by distilling the logic down to some NOR gates, to minimise component count (the NOR gate also providing LED drive):

Lo-tech-CompactFlash-Adapter-ISA-B8-Logic

There’s probably a better way, but both of those work anyhow 🙂

Zero Wait State

The CARD SLCTD use for B8 was pretty much limited to the IBM PC/XT 5160 – the later PC/AT re-purposed the line as a “zero wait state” line, which has particular advantage for 8-bit cards.

As IBM started pushing the processor clock rates up, ISA at the time being effectively a local bus, IBM chose to limit the effective bus speed for 8-bit cards by adding wait states to achieve a bus cycle time of about 750ns, roughly equivalent to the 4.77MHz speed of the original PC.  At the time, expansion cards could be quite an investment so keeping things compatible was key.

Through the ZWS (B8) line in the PC/AT, and using the same basic logic as used to generate CARD SLCTD, the wait states can be eliminated for reads, and the logic simply extended to also consider IOW to also eliminate the wait states on writes.  This can boost throughput by 30% on a 6MHz AT, and 50% on a later 12MHz AT class machine.

More Information