THE RANT / THE SCHPLOG
Schmorp's POD Blog a.k.a. THE RANT
a.k.a. the blog that cannot decide on a name

This document was first published 2015-11-27 04:07:37, and last modified 2015-12-21 00:03:01.

Emulating VT102 Hardware in Perl - Part 3: The CPU

This third part of the series will use the variables and opcode table to implement the virtual CPU, which also is the mainloop of the emuolator.

The Mainloop

At its heart, the vt102 emulator is an endless loop:

my @ICACHE; # compiled instruction/basic block cache

my $POWERSAVE; # powersave counter

my $RIN; # libev for the less well-off

(vec $RIN,           0, 1) = 1 if $KBD;
(vec $RIN, fileno $PTY, 1) = 1 if $PTY;

# the cpu.
while () {
   ...
}

The variables it uses are @ICACHE, which contains code references representing basic blocks, $POWERSAVE, a helper counter that is used to decide when to suspend execution when idle, and $RIN, used to select (poll) on STDIN for keyboard input and the pty where the program that is running inside the emulator is connected to.

Each iteration of the loop executes an extended basic block, which is represented by a code reference stored at ICACHE[$PC], which updates the CPU state and returns the new program counter:

# execute an extended basic block
$PC = ($ICACHE[$PC] ||= do {
   ... generate the coderef
})->();

What this does is fetch the coderef from $ICACHE[$PC], generating it if neccessary, and then simply call it. This doesn't handle self-modifying code, but since all code is in ROM, no modifications are going on.

The magic of the JIT compiler is inside the do block. It is conceptually simple: it takes a number of instructions that are executed sequentially and compiles them into a single coderef, by looking them up in the opcode table and joining them together.

The only difficulty is the branches, which change code execution, but can be handled by updating the $PC and simply returning from the generated code reference. Some instructions permanently change the flow of execution, such as call and ret (return) - these end the extended basic block, and also code generation.

In addition, only up to 32 instructions are being processed - more make the emulator a bit faster, but also uses a lot more memory. Fewer make it somewhat slower, but also conserves memory.

Having said that, let's look at the code generation:

   my $pc = $PC;

   my $insn = "";

   # the jit compiler
   for (0..31) {
      my $imm;

First we make a copy of the program counter - $PC is the actual program counter, while $pc contains the address of the instruction that is currently being translated. While they start at the same value (we start compiling the basic block where the CPU currently executes), they of course diverge with every instruction - $PC doesn't change, the address of the instruction currently being compiled of course does.

The variable $insn is a bit of a misnomer - it contains all instructions from the block we compile, not just a single one.

Then we iterate 32 times, over the next 32 instructions in memory. Each instruction is then translated, and the first step is to fetch the opcode string by using the opcode in memory at the current pc ($M[$pc]) as index into the opcode table @op:

      my $op = $op[$M[$pc++]];

The opcode table contains strings that are mostly valid perl code, except for some macros that expand to frequently used expressions. Here are some examples (format: address, opcode, corresponding string), starting at address 0 in the vt102 ROM:

0000 f3 $IFF = 0
0001 31 $SP      =  IMM16
0004 c3 JMP IMM16
003b 1e $E = IMM8
003d f3 $IFF = 0
003e 3e $A = IMM8
0040 d3 OUT
0042 2f $A ^= 0xff
0043 d3 OUT
0045 af sf8 $A ^=              $A

And a few more complex examples:

004f 07           $FC =       $A & 0x80; $A = (($A << 1) + ($FC && 0x01)) & 0xff
0052 c2 BRA IMM16 if !$FZ
00b9 d5 PUSH $D; PUSH $E
00ba cd (PUSH PC >> 8), (PUSH PC & 0xff), (BRA IMM16)
02b9 21 ($H, $L) = (IMM16 >> 8, IMM16 & 0xff)
02bc 22 ($M[IMM16], $M[IMM16 + 1]) = ($L, $H)

What follows is a lot of regex substitutions that replace the macros in the opcode strings by valid perl code:

      for ($op) {
         s/\bPUSH\b/\$M[--\$SP] =/g; # push byte to stack
         s/\bPOP\b/\$M[\$SP++]/g;    # pop byte from stack
     
         s/\bIMM16\b/$imm \/\/= $M[$pc++] + $M[$pc++] * 256/xge; # 16 bit insn immediate
         s/\bIMM8\b /$imm \/\/= $M[$pc++]                  /xge; #  8 bit insn immediate
     
         s/\bPC\b/$pc/ge; # PC at end of insn
         s/\bBRA\b/return/g; # conditional jump
         s/\bJMP\b(.*)/$1\x00/sg; # unconditional jump
     
         s/\bIN\b/ sprintf "\$A = in_%02x", $M[$pc++]/xge; # in insns call in_HEX
         s/\bOUT\b/sprintf "out_%02x \$A ", $M[$pc++]/xge; # out likewise
      }
     
      $insn .= "$op;\n";

The for is used only for its side effect of aliasing $op to $_, so I don't have to write $op =~ before each regex substitution - it's kind of like the pascal with statement.

PUSH, POP, IMM16 and IMM8 are pretty straightforward - immediate operands are simply fetched from memory while incrementing the virtual $pc, so it correctly points to the next instruction.

Similarly, PC is replaced by the (numerical) program counter, while IN and OUT call perl functions of the name in_HEX and out_HEX, which then implement the corresponding hardware function.

Conditional branches, calls and returns (using BRA) are most complicated - when taken, they update the $PC to the branch target and return. For instance, this branch:

0052 c2 BRA IMM16 if !$FZ ; jnz     X004f

Will be replaced by this perl code, which returns the new program counter:

return 79 if !$FZ

Unconditional jumps (using JMP) can be optimized further - since they never execute instructions following them, they will be marked using a NUL byte, which is later used to truncate the perl code at that position.

For example, this jump instruction:

0004 c3 JMP IMM16 ; jmp     X003b

Will result in this perl code, which returns the new $PC value:

59\x00

It isn't valid due to the binary zero, but that will be removed later Since this will then be the last statement in the generated code, it doesn't need to use a return statement.

After the opcode has been mangled and updated, it will finally be appended to $insn, and the next opcode will be converted.

After the loop, we end up with a $insn string that contains 32 instructions. If there isn't any embedded jump, the virtual $pc will, at that point, be the actual $PC, so it is returned:

   $insn .= $pc;

And lastly, if there was an unconditional jump, then all instructions after it in memory will never be executed, so they can be removed:

   $insn =~ s/\x00.*$//s;

What remains is to compile this code and return it, so it can be cached in $ICACHE[$PC]:

   eval "sub { $insn }" or die "$insn: $@"

Here are the first two actual compiled basic blocks, again starting at the reset vector. The first is short, because it only disables interrupts, loads the stack pointer and quickly jmp's to the firmware init function at address 59:

$IFF = 0;
$SP      =  8270;
 59

The second, the first block of the firmware init function, is much longer:

$E = 1;
$IFF = 0;
$A = 15;
out_62 $A ;
$A ^= 0xff;
out_42 $A ;
sf8 $A ^=              $A;
$D = $A;
$L = $A;
$H = $A;
sf_nc ++$A;
$B = $A;
out_82 $A ;
$C = 8;
          $FC =       $A & 0x80; $A = (($A << 1) + ($FC && 0x01)) & 0xff ;
sf8 $A ^=              $M[$H*256+$L];
sf_nc ++$L;
return 79 if !$FZ;
sf_nc ++$H;
sf_nc --$C;
return 79 if !$FZ;
sf8 $A |=              $A;
return 91 if !$FZ;
$A = $B;
sf  $A -  4;
return 73 if !$FZ;
sf_nc ++$A;
out_82 $A ;
$C = 170;
$B = 44;
$A = in_42;
sf8 $A &= 2;
111

This concludes the JIT compiler, and also finishes the description of actual CPU instruction execution.

Time Flies Like an Arrow, Fruit Flies Like a Banana: I/O and Hardware Emulation

The second (of three) part of the mainloop deals with hardware - video updates, keyboard and serial line. It only runs every 16 basic blocks - more often is unnecessary (it takes the firmware a while to update the screen for each received character), and less often will slow down character processing.

++$CLK;

# things we do from time to time only
unless ($CLK & 0xf) {

Inside this block, and even less often (every 4096 basic blocks), we poll for I/O:

   # do I/O
   unless ($CLK & 0xfff) {
      if (select $x = $RIN, undef, undef, $POWERSAVE < 10 ? 0 : $CURSOR_IS_ON && 3600) {

The select statement polls for STDIN (the keyboard) and the pty (the program connected to the terminal). The variable $POWERSAVE is zeroed each time there is "activity" (such as keyboard input or input from the serial pty), and otherwise incremented. After ten iterations without any activity, the select statement is allowed to block the process for up to one hour, which is the power save mode mentioned earlier.

As a special feature of powersave mode, the variable $CURSOR_IS_ON contains true when the cursor is visible, and this in turn is used to avoid power save mode while the cursor is off, so that it stops blinking in powersave mode, as opposed to becoming invisible.

After the select, the program first checks for serial input (input from the pty/the program running inside):

         # pty/serial I/O
         if ($PTY && (vec $x, fileno $PTY, 1) && (@PUSARTRECV < 128) && !@KQUEUE) {
            sysread $PTY, my $buf, 256;

            # linux don't do cs7 and/or parity anymore, so we need to filter
            # out xoff characters to avoid freezes.
            push @PUSARTRECV, grep { ($_ & 0x7f) != 0x13 } unpack "C*", $buf;
         }

If there is input, it will read up to 256 bytes and feed these into @PUSARTRECV, which is consulted by the serial line hardware emulation each time the VT100 checks for a character from the serial line.

Since Linux does not support 7-bit character or parity anymore, we need to filter out "accidental" xOFF characters that are actually 8 bit characters that only look like xOFF when you are a VT100 terminal and ignore the 8th bit. If we don't do that, the terminal might seem to "lock up".

The other thing to check for is keyboard input:

         # keyboard input
         if ($KBD && (vec $x, 0, 1)) {
            # to avoid non-blocking mode on stdin (and stty min 0), we
            # just read byte-by-byte after a select says there is data.
            while (select my $rin = "\x01", undef, undef, 0) {
               sysread STDIN, $STDIN_BUF, 1, length $STDIN_BUF
                  or last;
            }

            stdin_parse if length $STDIN_BUF;
         }

Conceptually this is simple as well, the script reads the keyboard input (or whatever STDIN connects to) and feeds it into a queue ($STDIN_BUF). Of course the real world interferes again, forcing the script to read character-by-character and using select between each to see if there is more.

After this is done, the stdin_parse function will try to convert the (urxvt, xterm, ...) key sequences received into VT100 hardware keycodes (more on that in part 4 of this series).

What remains is to reset $POWERSAVE, as we just had some activity:

         $POWERSAVE = 0; # activity

Next, if there wasn't any I/O event, but we still have outstanding serial line data (@PUSARTRECV> or keyboard input @KQUEUE), we also must not idle, and also have to reset $POWERSAVE:

      } elsif (@PUSARTRECV || @KQUEUE) {
         $POWERSAVE = 0;

Only when serial line and keyboard are both idle can we increment $POWERSAVE:

      } else {
         ++$POWERSAVE;
      }

After this, we did handle all I/O, but some other hardware components might need attention. First, if there is outstanding data on the serial line, we signal this to the VT100 firmware via interrupt #2:

   # kick off serial input interrupt quite often
   # VT100, but works on vt102, too (probably not used on real hardware though)
   $RST |= 2 if @PUSARTRECV && $XON;

   # VT102, 6.5 rxrdy
   # $INTPEND |= 2 if @PUSARTRECV && $XON;

The VT102 code (that issues half interrupts) isn't used, as the VT100 code works fine with the VT102 ROM, even though the real hardware probably doesn't use interrupt #2.

Signaling this interrupt too often is not healthy, as the VT100 firmware can't cope with the resulting high speeds - the interrupt handler only writes into a queue and might even enable interrupts temporarily, which leads to lost characters or stack overflows due to recursion - the interrupt frequency and thus the serial line speed in the emulator is already much faster than the real hardware has to cope with, but trying the emulator at realistic speeds of 9600 baud or so is no fun.

An additional complication is the race between enabling/allowing interrupts (ei) and the ret instruction that ends an interrupt handler - if an interrupt would occur between ei and ret this would use stack space, and if it happened too often, it would overflow the stack area, which isn't very big to begin with.

A real 8080 handles this by simply not checking for interrupts directly after an ei instruction, so ei; ret is a safe combination. The emulator handles this by making basic blocks the right length so they include both ei and ret, and only checking for interrupts between such blocks.

Two more things are being done regularly. First, a vertical retrace interrupt is generated from time to time, but more often than it would be generated in the real hardware. This is necessary to speed up some processing inside the firmware such as some forms of scrolling that can only happen during vertical retrace.

   # kick off vertical retrace interrupt from time to time
   unless ($CLK & 0x1ff) {
      $RST |= 4; # vertical retrace
   }

And lastly, but most importantly, occasionally vt102 will read the video buffer (or rather, execute the display list), so we can watch the output:

   # handle video hardware
   unless ($CLK & 0x3fff) {
      display;
   }

And this concludes the I/O and powersave handling.

Interrupt Handling

The third and last part of the mainloop is interrupt handling. Again conceptually simple - if an interrupt is pending we push the current $PC to the stack and continue execution at the interrupt vector. Of course, this simple event is being complicated because there are two sources of interrupts, and the fact that the firmware interrupt handling is buggy.

Anyway, the first thing is to check for any pending interrupts - if no interrupts are pending, which is the most common case, the whole block can be skipped:

   if (($RST || ($INTPEND & ~$INTMASK)) && $IFF) {
      # rst 1 kbd data available
      # rst 2 pusart xmit+recv flag
      # rst 4 vertical retrace
      # 5.5   vt125 mb7 trans ready (serial send?)
      # 6.5   vt125 mb7 read ready (something modem?)
      # 7.5   vt125 mb7 vblank h(?)
      # trap  vt125 mbi init h(?)
      my $vec;

      my $pend = $INTPEND & ~$INTMASK;

Inside the block, the pending set of half interrupts is calculated - the pending set of "normal"/"full" interrupts already is in $RST. Those sets are then used to decide on which interrupt vector to invoke:

   if      ($pend & 1) { $vec = 0x2c; $INTPEND &= ~1;
   } elsif ($pend & 2) { $vec = 0x34; $INTPEND &= ~2;
   } elsif ($pend & 4) { $vec = 0x3c; $INTPEND &= ~4;
#  } elsif ($RST     ) { $vec = $RST * 8; $RST = 0; # the vt102 firmware doesn't like combined interrupts
   } elsif ($RST  & 1) { $vec = 0x08; $RST     &= ~1; # separate is better for vt102
   } elsif ($RST  & 2) { $vec = 0x10; $RST     &= ~2;
   } elsif ($RST  & 4) { $vec = 0x20; $RST     &= ~4;
   } else {
      die;
   }

As mentioned earlier, the VT102 firmware does not handle combined interrupts, so the commented out line cannot be used, and each interrupt source is signalled separately.

After deciding on the interrupt vector, all that is left is to push the current $PC to the stack, disable interrupts, and set the $PC to the interrupt vector address:

   # jump to the interrupt vector
   $M[--$SP] = $PC >> 8;
   $M[--$SP] = $PC & 0xff;
   $PC = $vec;

   $IFF = 0;

And that's it for the mainloop - after this, the only thing in the program are two closing braces followed by the __DATA__ section with the ROMs.

Next Episode

The next part of this series will dive into the emulated (and/or simulated :) VT100 hardware itself.