Addressing Modes
Absolute (data segment):
0000: abs
0001: abs, x
0010: abs, y
0011: abs, s
Absolute (code segement):
0100: code
0101: code, x
0110: code, y
0111: [code], y
Page Zero/GPRs:
1000: reg
1001: reg, x
1010: reg, y
Special form of abs,s:
1011: stk, s (1-byte displacement)
Indirect addressing modes:
1100: [reg]
1101: [reg,x]
1110: [reg], y ;Data segment pointer
1111: [reg], y ;Code segment pointer
EndFragment
Register Specifications
General-purpose registers (GPRs, that is, zero page locations) can take multiple sizes. The default 65000 encoding directly supports two sizes: bytes (8 bits) and words (16 bits). Through the use of a size override prefix byte (ugh!), instructions could also handle double words (dwords/32 bits) or quad words (qwords/64 bits). I’m treating the size override prefix byte as a “future enhancement.” However, it would be foolish not to allow for larger operands (than 16 bits) in the initial design, even if the expectation is that the 65000 will be a 16-bit processor.
Because the 65000 supports up to 256 registers, instructions like LDA and STA will go away. In their place we need some way of specifying which GPR to use and the size of that GPR. I’m going with the following syntax:
B(nn) - Byte-sized register at zero-page location nn (nn is a numeric value in the range 0 to 255).
W(nn) - Word-sized register at zero-page location nn.
D(nn) - DWord-sized register at zero-page location nn.
Q(nn) - QWord-sized register at zero-page location nn.
R(nn) - Register size/type is irrelevant or determined by instruction (e.g., loading a pointer from a GPR).
I can’t decide if register should be required to be aligned at an address that is a multiple of their size. On the one hand, that simplifies the hardware; on the other, it complicates the software. Hard call. I’ll leave that to anyone who actually decides to build one of these things.
I: Interrupt disable bit (same as 6502).S: Single-step mode (generates a single-step break after each instruction).[A1:A0]: Address size control. Generally intended for future expansion of the 65000. %00 specifies 16-bit addresses, %01 specifies 24-bit addresses, %10 specifies 32-bit addresses, and %11 specifies 64-bit addresses. N: The sign flagV: The overflow flagZ: The zero flagC: The carry flag
I’ll start with the absolute addressing modes as they are probably most familiar to anyone who has programmed a 6502 in assembly language.
The absolute addressing modes reference a data value in the data segment.
abs: accesses the data value at the address specified by the sign-extended displacement value associated with the opcode.abs, x: accesses the location specified by (disp + X) in the data segment. Note that “abs” could be the name of an array with X indexing into that array (usual 6502 paradigm), or X could be the base address of the object and the disp value is a signed offset from that location.abs, y: ditto (as for abs, x but using the Y register instead of the X register).abs, s: access an object at some offset (specified by the displacement) from the stack pointer register. You would not normally use the S register as an index register with this mode.
The code addressing mode:
This is very similar to the absolute addressing mode except it references data (usually read-only constants) in the code segment. As the code segment is usually sitting in ROM (FLASH), storing data into the code segment generally doesn’t accomplish anything. The modes are:
code: accesses the data value at the address (in the code segment) specified by the sign-extended displacement value associated with the opcode. abs, x: accesses the location specified by (disp + X) in the code segment.abs, y: accesses the location specified by (disp + Y) in the code segment.[code],y: The value at the location specified by the displement (“code”) in the code segment is an address (this could be two to eight bytes depending on the 65000 address size). Fetch this value and add the value of the Y index register to get the effective address; access the data at the effective address.Note that “[code],y” replaces “abs, s” in the absolute addressing modes. “code, s” just doesn’t make any sense at all (the stack is always in the data segment). “[code],y” seemed useful (and very 6502-ish), so I added it. This addition is a bit non-orthogonal (“What? No [code],x or [code,x] modes?” This seemed like a better response than simply reserved that encoding.
reg: page zero access. Note that the zero-page (GPR) address is always encoded as a single byte specifying a zero page location in the range 0 to 255 ($00 to $FF). This fully encodes all 65000 GPRs with a single byte.reg, x: accesses the location specified by (disp + X) in the zero page segment. If the sum of (disp+X) exceeds 255, the 65000 uses only the LO eight bits as the offset into page zero.reg, y: ditto (as for zp, x but using the Y register instead of the X register).stk, s: This is a single-byte encoding of the “abs, s” addressing mode, with the displacement being limited to -128 to +127. Note that you can encode half of these offsets with the “abs,s” mode. So there is some overlap here. That was deemed a “good thing” because most offsets from S will be in the range ±128.
Finally, we get to the page zero addressing modes (which should mostly be familiar):
[reg]: Indirect addressing mode. A single byte (displacement) specifies a GPR. That GPR holds an address (consuming as many bytes are are necessary to hold an address, which could be 2, 3, 4, or 8 bytes). Access the data location specified in the data segment.[reg, x]: compute (disp+X), truncating the result to eight bits. Fetch the pointer in the page zero segment at the address specified by this sum. Use the value of that pointer as the effective address into the data segment.[reg], y: The displacement (GPR number) is the address of a pointer in the page zero segment. Fetch the value of that pointer and sum that with the value in the Y register. That sum is the effective address to use (in the data segment).[reg],y (code): Similar to the above, except that the effective address references a location in the code segment rather than the data segment.Except for the JMP and JSR instructions, any instruction that accesses memory can use any of these addressing modes (where appropriate; storing in to the code segment is generally frowned upon).
Register, #Immediate instructions
StartFragmentImmediate Reg
0 0 1 s 0 i i i [opcode] [register] [disp/imm]
0 0 0 ADD reg, #imm
0 0 1 ADC reg, #imm
0 1 0 CMP reg, #imm
0 1 1 AND reg, #imm
0 0 0 BIT reg, #imm
1 0 1 OR reg, #imm
1 1 0 EOR reg, #imm
1 1 1 LDR reg, #imm
EndFragment
The “register, #immediate” instruction perform arithmetic/logical operations on a register with a constant value.
The “s” bit is encoded with 0 for byte (or dword) operands, 1 for word (or qword) operands. Note that dword and qword operands will require a size override prefix byte before the instruction to double the operand size from byte/word to dword/qword.
If you look closely, you’ll notice that the SUB/SBC instructions are missing. You can easily synthesize this by usind ADD/ADC and supplying a negated constant. It also turns out that the ANDN (and not) instruction is missing. You can synthesize this with an AND instruction, inverting the immediate operand.
Immediate operations on the index registers are also missing from this list.
Don’t worry, those will come later.
The I, N, V, Z, and C flags have the same meaning and usage as on the 6502. The decimal flag on the 6502 is gone (you can do decimal arithmetic manually if you really need it; look up excess-3 coding). The 6502 B flag is gone because the BRK and IRQ vectors are different on the 65000. The S flag is new, it generates a single-step exception (like BRK) after the execution of each instruction.
The A1/A0 flags are probably the most interesting addition here. They control the size of an address on the 65000 (much like the M and X bits in the 65c816). These bits control the size of the X, Y, S, and PC registers as well as the size of pointers in page zero and the code segment.
When A1/A0 are programmed with %00, all addresses are 16 bits. Pointers in page zero are two adjacent bytes. Likewise, pointers appearing in the code segment are two adjacent bytes.
When A1/A0 are programmed with %01, addresses become 24 bits (three bytes) long. Pointers appearing in the page zero or code segments consume three bytes each.
When A1/A0 are programmed with %10, addresses become 32 bits and when A1/A0 contain %11, addresses become 64 bits (eight bytes) long.
Note that A1/A0 do not affect the X, Y, S, or PC registers. Presumably, they can handle any address size that comes along. If you load a value into one of these registers that is larger than the current address size, the 65000 will only use the LO bits that match the A1/A0 specification.
If someone were to actually build a 65000, they could elect to ignore the A1/A0 settings and use only 16-bit addresses (for example). However, they must still be able to handle an instruction like the following:
ldx #someValueLargerThan16Bits
They might only load the LO 16 bits of that displacement value into the X register, but the CPU must be able to deal with the fact that the displacement could be up to nine bytes long (with an 8-byte signed value).
Most importantly, if a CPU does support addresses larger than 16 bits, if an application programs A1/A0 with %00, the CPU must limit the addresses it puts on the bus to 16 bits (masking the HO bits to zeros), but it should maintain full addresses in the index registers, SP, and PC (in case the program switches back to a larger address size).
Program Status Register
The Program Status Register (PSR) is very similar to the 6502’s. It’s been expanded to 16 bits to allow a better organization of the bits (condition codes versus system settings). The 65000 PSR is the following:
Note that there is no “s” bit encoding for these instructions.
Register, Register Instructions
If this were a RISC machine, this is where most of the “computations” would be taking place: registers. These instructions are three bytes long consisting of an opcode, a source register encoding, and a destination register encoding.
The Instruction Set
Okay, I’ve beat the preliminaries to a pulp at this point. Let’s take a look at the instruction set:
Single Register Instructions
The following instructions take a single register as their operand.
StartFragmentSingle-register instrs
0 1 1 s i i i i [opcode] [register]
0 0 0 0 SHL reg
0 0 0 1 SHR reg
0 0 1 0 ASR reg
0 0 1 1 ROL reg
0 0 0 0 ROR reg
0 1 0 1 NOT reg
0 1 1 0 PUSH reg
0 1 1 1 POP reg
1 0 0 0 ANDNPSR reg
1 0 0 1 ORPSR reg
EndFragment
Regster, #imm instructions are 3+ bytes long. One byte for the opcode, one byte to encode the register, and one (or more) bytes to encode the immediate value as a displacement.
StartFragment
StartFragmentBranches Instr
0 0 0 0 i i i i [opcode] [disp]
0 0 0 0 0 0 0 0 BEQ disp
0 0 0 0 0 0 0 1 BNE disp
0 0 0 0 0 0 1 0 BCS disp
0 0 0 0 0 0 1 1 BCC disp
0 0 0 0 0 0 0 0 BVS disp
0 0 0 0 0 1 0 1 BVC disp
0 0 0 0 0 1 1 0 BMI disp
0 0 0 0 0 1 1 1 BPL disp
0 0 0 0 1 1 0 0 BULE disp
0 0 0 0 1 0 0 1 BUGT disp
0 0 0 0 1 0 1 0 BLT disp
0 0 0 0 1 0 1 1 BLE disp
0 0 0 0 1 0 0 0 BGT disp
0 0 0 0 1 1 0 1 BGE disp
0 0 0 0 1 1 1 0 BSR disp
0 0 0 0 1 1 1 1 BRA disp
EndFragment
Except for the ANDNPSR and ORPSR instruction, these instruction operate directly on the register specified by their operand (it is both the source and destination for the operation).
These are the usual 6502 logical and arithmetic shift and rotate (through carry) instructions. The semantics are identical to the 6502 other than the expanded operand sizes.
Note that it is perfectly reasonable to push bytes, words, dwords, or qwords onto the stack. The 65000 stack does not need to be aligned to any particular size (though there is always the argument that real hardware might be faster if the SP is aligned to a word or dword boundary).
The ANDNPSR and ORPSR instructions are slightly different from the others in this group. They operate on the PSR, not the specified register. ANDNPSR clears all the bits in the PSR that correspond to set bits in the reg operand. The ORPSR instruction logically ORs the register with the PSR.
If the s bit in the opcode is a 0, then this instruction only affects the LO eight bits of the PSR. If the s bit is 1, this instruction affects all 16 PSR bits. Size prefix override bytes are ignored by this instruction.
StartFragment
StartFragmentReg-to-Reg Instrs
0 1 0 s i i i i [opcode] [src reg] [dest reg]
0 0 0 0 ADD dest, src
0 0 0 1 ADC dest, src
0 0 1 0 SUB dest, src
0 0 1 1 SBC dest, src
0 0 0 0 CMP left, right
0 1 0 1 AND dest, src
0 1 1 0 ANDN dest, src
0 1 1 1 BIT src1, src2
1 0 0 0 OR dest, src
1 0 0 1 EOR dest, src
EndFragment
Branches are encoded with a single-byte opcode followed by a displacement value (1-9 bytes, though it would be hard to imagine something beyond 3-4 bytes).
Many of the branch names should be familiar to 6502 programmers:
BEQ: Branch if equal (Z=1).
BNE: Branch if not equal (Z=0).
BCS: Branch if carry set (C=1).
BCC: Branch if carry clear (C=0).
BVS: Branch if overflow set (V=1).
BVC: Branch if overflow clear (V=0).
BMI: Branch if minus (N=1).
BPL: Branch if plus (N=0).
BULE: Branch if less than or equal (unsigned).t
BUGT: Branch if greater than (unsigned).
BLT: Branch if less than (signed).
BLE: Branch if less than or equal (signed).
BGT: Branch if greater than (signed).
BGE: Branch if greater than or equal (signed).
BSR: Branch to subroutine.
BRA: Unconditional branch (branch always).
A Bit of a Mind Game: The 65000
6502 assembly language programmers will notice a couple of new instructions here:
First, there are a set of conditional branches for use after comparing signed values. Signed branches were left out of the original 6502 instruction set to simplify things. It took several instructions to synthesize these. Therefore, they were added to the 65000 instruction set to fix this omission.
Also notice the BRA and BSR that provides a PC-relative unconditional branch and subroutine call.
There are also a couple of unsigned branches added here: BULE and BUGT (less than or equal and greater than. You may notice that BUL (branch if less than, unsigned) and BUGE (branch if greater than or equal, unsigned) are missing. The BCS and BCC instructions are synonyms for these (respectively). A decent assembler would probably allow BZS and BZC for BEQ/BNE. It would probably allow BSS and BSC for BMI/BPL, as well.
Single-Byte Instructions
The following are all instructions encoded with a single byte:
StartFragmentSingle-byte opcodes
0 1 1 i 1 0 1 i [opcode]
0 0 Size prefix byte
0 1 Reserved
1 0 NOP
1 1 BRK
Single-byte stack opcodes
0 1 1 i 1 1 i i [opcode]
0 0 0 PUSHX
0 0 1 POPX
0 1 0 PUSHY
0 1 1 POPY
1 0 0 RET
1 0 1 IRET
1 1 0 PUSHPSR
1 1 1 POPPSR
EndFragment
These instructions should be mostly familiar to 6502 programmers. ADD and SUB are new (but their function should be obvious). ANDN (and not) is new. This instruction computes
dest = dest AND (not src)
The synonym BIC (bit clear) probably makes this operation clearer- it clears all the bits in dest corresponding to the set bits in src.
Note: see the discussion of the BIT instruction at the end of the document. Its semantics have changed slightly from the 6502 version.
The [reg] addressing mode specifies some GPR in the page zero segment. The 65000 fetches the pointer at that address (size specified by the A1/A0 PSR bits) and copies that pointer into the PC. This transfers control to the specified location in the code segment.
The [reg, x] addressing mode fetches the pointer in page zero from the location specified by the sum of the register number and the contents of the X register. The 65000 then copies this pointer into the PC. Again, control is transferred to that location in the code segment.
The [reg, y] is similar to the above, but uses the contents of the Y register.
The [abs], [abs, x], and [abs, y] addressing modes are similar to the above, except that a displacement value is added to the index registers to obtain the address of the pointer in the data segment. The 65000 copies that pointer into the PC, which transfers control to the specified location in the code segment.
The [code], [code, x], and [code,y] addressing modes operate in a similar manner to the above, except they fetch pointer values from the code segment and move those values into the PC.
Keep in mind that the pointers these instructions fetch can be 2, 3, 4, or 8 bytes long, depending on the A1/A0 bits in the PSR.
The JSR instructions behave just like the JMP instructions except they push the address of the next instruction on the stack before transfering control to the target location.
Most of these instructions are identical to their 6502 counterparts (if you can forgive me for using “pop” instead of “pull”). The only two worthy of much comment are the size prefix byte and the reserved opcode.
The size prefix byte sets a flag in the CPU so that the next instruction’s data size is increased by a factor of four (bytes become dwords, words become qwords). If the next instruction doesn’t have an “s” bit encoding the size, then the instruction ignores this prefix and it has no further effect on program execution.
The “Reserved” byte is reserved for future expansion. This is the only single-byte opcode that is not spoken for. I could easily have dreamed up a use for it, but I wanted to leave it open “just in case."
One day as I was sitting around, bored, and looking for something more interesting to do than my actual work (writing a book, at this particular point), my mind started wandering back to the good old Apple II days. While I wasn’t nostalgic enough to go dig my Apple // GS out of the storage trailer, I began wondering if any progress
Note that the reg-to-reg instructions only consume nine of the possible 16 binary encodings.
010s1010 and 010s1011 are reserved for two-byte opcodes.
010s11xx opcodes will be defined shortly.
You might have noticed that not all the bits were spoken for in the encodings of the JMP and JSR instructions (above and beyond the ones that were left “reserved”). Those extra bit patterns form the basis for several additional instructions that follow. We’ll start with the Special Immediate Instructions:
StartFragmentSpecial Immediate Instrs
0 1 0 0 1 0 1 0 0 0 0 1 1 1 i i [opcode1] [opcode2] [imm disp]
1 1 1 0 0 RET #imm
1 1 1 0 1 SYS #imm
1 1 1 1 0 TSXAS #imm
1 1 1 1 1 TSYAS #imm
EndFragment
EndFragment
Index Register Immediate Instructions
The following instructions provide various operations with immediate operands for the X, Y, and S registers:
StartFragmentX & Y Immediate
0 1 0 0 1 1 i i [opcode] [imm disp]
0 0 LDX #imm
0 1 LDY #imm
1 0 LDS #imm
1 1 ADDS #Imm
0 1 0 1 1 1 i i [opcode] [imm disp]
0 0 CPX #imm
0 1 CPY #imm
1 0 ADDX #imm
1 1 ADDY #imm
EndFragment
The Load and Store Instructions
Okay, we’ve arrived at the most heavily-used instructions in the instruction set (well, along with the load immediate instruction, presented earlier).
StartFragmentLoad/Store Reg Instructions
1 0 i s m m m m [opcode] [register] [disp]
0 LDR reg, mem
1 STR reg, mem
1 1 i i m m m m [opcode][disp]
0 0 LDX mem
0 1 STX mem
1 0 LDY mem
1 1 STY mem
EndFragment
had ever been made on an upgraded 65816 CPU. I did some quick internet searches and rapidly discovered that the vast majority of information on such an upgrade were links to an ancient article that *I* had written for Micro: The 6502 Journal,
many eons ago:
Bill Mensch, head honcho at the Western Design Center had talked about a 65832 CPU back then. Apparently some talk had been made about an actual 65T32 CPU (a real 32-bit device). However, nothing ever happened.
As an old 6502/65816 fan, it was somewhat sad that the venerable 6502 never had a 65000 part to match the 68000 (not to mention a 32-bit part).
These instructions should seem familiar to 6502 programmings. Granted, there are no LDA and STA instructions, because we no longer have an accumulator register. But in their place we have the LDR and STR instructions and 256 registers!
Another nice thing about these instructions is that they are very orthogonal with respect to memory addressing modes. If the addressing mode is legal, it can be used with all these instructions. For example,
LDY abs, y
is not legal on the 6502, but works just fine on the 65000 (and could be quite useful for stepping through linked lists, for example).
Note: use LDX R(nn) syntax to load X from a GPR. The size is determined by A1/A0, not the register size. Ditto for LDY, STX, and STY.
The RET #Imm instruction is a variant of the RET call that adds the value of the immediate/displacement constant following the opcode after popping the return address. This should be a positive number (as subtracting the constant from SP can produce some weird results). This instruction is useful for removing parameters and other data from the stack upon return from a subroutine.
The SYS #Imm instruction does a system call. It pushes a return address onto the stack that points at the displacement constant. It is the system call handler’s responsibility to parse the displacement (presumably extracting that number as a system function number) and push a return address past the displacement back onto the stack.
The TSXAS and TSYAS instructions transfer the SP to the X or Y registers (respectively) and then subtract the immediate value from the SP register (that is, transfer S to X/Y and adjust stack). This is generally done to set up a stack frame inside a procedure and allocate local variables for it.
Note that it is the subroutine’s responsibility to first push the X or Y register onto the stack if it contains a frame pointer that needs to be saved.
LDX, LDY, CPX, and CPY do the usual 6502 things- loading a register with an immediate value. Note that the X, Y, and S registers are pointer-sized, not 8- or 16-bits. The [imm disp] displacement value could be up to nine bytes long for this instruction (producing a 64-bit immediate value). Regardless of the address size, the 65000 should load this whole value into the register (assuming the physical implementation is that large) and simply ignore the HO bits (beyond the address size) when actually outputting memory addresses to the bus.
The LDS instruction is new, but it’s purpose should be obvious — initializing the stack pointer.
ADDS, ADDX, and ADDY are all new, they add their immediate constant to the respective register. Note that there are no increment or decrement instructions (of any kind) on the 65000. You can increment a register by adding one to it. You can decrement the register by adding -1 to it.
Two-Byte Opcodes
At this point, we’ve exhausted all the 256 possible single-byte opcodes. To keep adding instructions (and we're not done yet), we’ll have to go to two-byte opcodes. Note that this is not the same thing as two-byte instructions. Very few of the instructions up to this point were single-byte instructions even though they had single-byte opcodes. Adding register bytes and displacement bytes expanded the average size of the instructions in the previous section. If you limit displacements to one byte, the average instruction length in the previous section is probably between 2 and 3 bytes (not too far off from the 6502). In this section, we’re going to add an extra opcode byte, so expect the average length to jump to between 3 and 4 bytes (assuming 1-byte displacements; worse if they are larger).
Some Stack Instructions
Next up are some useful SP manipulation instructions:
StartFragmentStack instructions
0 1 0 1 1 0 1 0 0 1 i i m m m m [opcode1] [opcode2] [disp]
0 0 LDS mem
0 1 STS mem
1 0 ADDS mem
1 1 SUBS mem
EndFragment
The four-bit binary number appearing before each addressing mode corresponds to the “mmmm” encoding you’ll find in the instructions later in the document.
These instructions load SP from some memory location, store SP into a memory location, add the contents of some memory location to SP, or subtract the contents of some memory location from SP.
Note: because the “mmmm” encoding for the addressing modes (see their description earlier) includes page zero GPRs, you can also use these instructions to load SP from a GPR, store SP into a GPR, add the value of a GPR to SP, or subtract the value in a GPR from SP, e.g.,
LDS R(5)
Jumps and Calls
Although the instruction set up to this point as included the BSR and BRA instructions, there’s some functionality missing. Those instructions only provide PC-relative branches to their target locations. If you need indirect jumps and calls, those won’t work. So, this section adds a wide variety of indirect jumps and calls to complete the set.
It would be nice to use the existing memory addressing modes for the pointers in our indirect jumps and calls. However, due to the nature of the Harvard Architecture the 65000 uses, the existing addressing modes aren’t sufficient. Also, what does it mean to jump to location “abs, y”? For that reason, the jump and call instructions (JMP and JSR) get their own set of addressing modes.
StartFragmentJumps & Calls
0 1 0 0 1 0 1 0 0 0 0 i i i i i [opcode1] [opcode2] [disp]
0 0 0 0 0 Reserved
0 0 0 0 1 jmp [reg]
0 0 0 1 0 jmp [reg, x]
0 0 0 1 1 jmp [reg, y]
0 0 1 0 0 Reserved
0 0 1 0 1 jmp [abs]
0 0 1 1 0 jmp [abs, x]
0 0 1 1 1 jmp [abs, y]
0 1 0 0 0 jmp code
0 1 0 0 1 jmp [code]
0 1 0 1 0 jmp [code, x]
0 1 0 1 1 jmp [code, y]
0 1 1 0 0 Reserved
0 1 1 0 1 Reserved
0 1 1 1 0 Reserved
0 1 1 1 1 Reserved
1 0 0 0 0 Reserved
1 0 0 0 1 jsr [reg]
1 0 0 1 0 jsr [reg, x]
1 0 0 1 1 jsr [reg, y]
1 0 1 0 0 Reserved
1 0 1 0 1 jsr [abs]
1 0 1 1 0 jsr [abs, x]
1 0 1 1 1 jsr [abs, y]
1 1 0 0 0 jsr code
1 1 0 0 1 jsr [code]
1 1 0 1 0 jsr [code, x]
1 1 0 1 1 jsr [code, y]
EndFragment
In many respects, the ARM is probably the spiritual successor to the 6502. It was created by the same folks who designed the 6502-based BBC Micro way back when (Acorn Computers). They developed the “Acorn RISC Machine,” or ARM as we know it today. Anyone who has programmed both the 6502 and ARM in assembly language can probably recognize some similarities.
Now there was some talk at one point about a 32-bit 650K CPU. I recall thinking at the time that this was going to be as big a disaster as the (newly announced) Intel Itanium (which was finally discontinued in 2019). Fortunately, the 650K never saw the light of day.
In any case, back to the future… Still looking for any excuse to avoid real work, I thought “why don’t I work out a design for a 6502 extension as a bit of a mental exercise. Maybe I’ll call it the 65020 (a play on the 68000 and 68020). Whoops! A quick internet search and I discover that someone’s already played that game:
https://www.ucc.gu.uwa.edu.au/~john/65020.html
https://www.ucc.gu.uwa.edu.au/~john/65020-instructions.html
While it was an interesting read, it was a very minor upgrade to the the 6502, not at all what I was thinking about. So I set about wasting the better part of a day coming up with my own design, which I’ll just call the 65000 (I briefly toyed with the idea of 65032, but I didn’t want to associate it with 32-bitness).
To begin with, I wasn’t interested in developing a be-all/end-all 32- or 64-bit instruction set. Heck, the National Semiconductor NS32000 series (especially the NS32532) would have been my go-to choice to work from if I wanted to create yet another high-end CISC machine. However, that wasn’t the design philosophy of the 6502 and I wanted to carry that philosophy forward in my design.
The 6502 could be though of as the RISC of it’s day. It was truly a “Reduced (Instruction Set) Computer” at the time. That is, MOS Technologies designed the CPU to be as simple as they possibly could to yield a CPU they could sell for $200 when contemporaries (6800, 8080, and Z80) were going for around $100. They literally went in and said “can we get by without this instruction?” If so, they threw it out. A good example is the ADD instruction. The 6502 doesn’t have one. It has an ADC (add with carry) instruction. You have to explicitly clear the carry flag using the CLC instruction before ADC if you want a simple addition. The instruction set is highly non-orthogonal because they hand-picked addressing modes and instruction combinations to provide the minimum instruction set possible. Truly the RISC of its time.
(Note, BTW, that RISC does not stand for “Reduced (Instruction Set) Computer.” The correct emphasis is “(Reduced Instruction) Set Computer.” The point of RISC was to make each instruction as simple as possible, not make the instruction set as small as possible. Because of the (wonderful) addressing modes on the 6502, it definitely falls in the CISC camp.
The “reg” items are page zero locations (0-255). See the discussion of registers for more details
Instructions
Micro Article
X & Y Memory Operands
The following are instructions that operate on the X & Y index registers (note that loading values into X/Y was handled by the one-byte opcodes).
StartFragmentX & Y memory
0 1 0 1 1 0 1 0 1 i i i m m m m [opcode1][opcode2][disp]
0 0 0 CPX mem
0 0 1 CPY mem
0 1 0 ADDX mem
0 1 1 ADDY mem
1 0 0 SUBX mem
1 0 1 SUBY mem
1 1 0 LEX mem
1 1 1 LEY mem
EndFragment
// Technical Article
Arithmetic/Logical Instructions with Memory Operands
One big hole in the instruction set is specifying a memory operand to a GPR for arithmetic and logical operations. Here’s those instructions:
StartFragmentReg-memory instrs
0 1 0 s 1 0 1 1 i i i i m m m m [opcode1][opcode2][reg][disp]
0 0 0 0 ADD zp, mem
0 0 0 1 ADC zp, mem
0 0 1 0 SUB zp, mem
0 0 1 1 SBC zp, mem
0 1 0 0 CMP zp, mem
0 1 0 1 AND zp, mem
0 1 1 0 ANDN zp, mem
0 1 1 1 BIT zp, mem
1 0 0 0 OR zp, mem
1 0 0 1 EOR zp, mem
EndFragment
As usual, GPRs look like memory in the memory addressing modes. But you must use the R(nn) (untyped) notation for the register operands when specifying GPRs as the operands to these instructions. LEX and LEY are “load effective address (into X and Y)” instructions.
So what should the design goals of a 65000 CPU be?Design it for embedded systems. Think “better than Atmel” rather than “Beating the pants of ARM."Memory efficiency is a primary goal. 64k code + 64k data is perfect for most small embedded applications. Have the ability to expand beyond this, as necessary. But optimize for small systems.Do not allow the design to be limited to a particular address or data size. It should be able to support 8-, 16-, 32-, or even 64-bit data sizes; likewise, the address bus should be expandable without affecting any software.Preserve the 6502 “small is better” philosophy. Don’t worry about floating-point, vector, or other fancy instruction sets. Heck, the 6502 didn’t even have multiply or divide instructions; that would be fine, too.Page Zero was one of the more interesting aspects of the 6502. Make sure that’s present in the new design.Don’t worry about threads, multitasking, running Linux, or other high-end applications. Think “Arduino,” not Macintosh. Don’t worry about the cost of context switches.Correct a few deficiencies in the 6502 instruction set (like lack of signed branches and, of course, the ADC/SBC issue).Improve the orthogonality of the instruction set (but don’t take it to VAX-11 or NS32532 levels).Make the instruction set slightly better for compiler code generation. Nothing outrageous. Just things like having a stack bigger than 256 bytes and allowing easy access to data on the stack. Keep in mind, however, that this is a mental exercise and any “programming” for this hypothetical beast is going to be in assembly language (and probably on paper). Don’t go crazy with the compiler stuff.It would be really nice if the design was kept simple enough so that it has a snoball’s chance of an actual implementation on an FPGA, such as the Alchritry Au+ sold by SparkFun (https://www.sparkfun.com/products/17514).
Index Registers
The 65000 maintains the X and Y index registers from the 6502. I was tempted to add a third index register (Z), or to allow any of the general-purpose registers (GPRs/page zero) to be used. I nixed this idea for a comple of reasons:
Encoding registers takes bits in the opcode. Add one register and you double the number of bits you need to encode common addressing modes. I want to keep instructions short.Adding a register adds a whole set of instructions to the instruction set to load, store, compare, adjust (increment/decrement), transfer to other registers, etc.The improvements obtained by adding a single index register weren’t worth the complexity.Allowing the use of the GPRs completely blows up the instruction sizes, generally adding at least one additional byte to each opcode. Definitely not worth it when memory efficiency is a main priority.
One thing that has stuck in my mind for nearly 40 years was a discussion I once had with Steve Wozniack about the 6502. He was lamenting the fact that MOS Technology’s designers didn’t simply use page zero as the registers on the machine. Not knowing anything at the time about CPU design (or even designing an instruction set architecture), I thought this sounded like a good idea. In retrospect, after taking undergraduate and graduate computer architecture courses, I’ve come to understand why that wouldn’t have been a good idea way back in 1975.
Nevertheless, that idea has stuck in my head all these years so as I sat down to play this little mental game, realizing Woz’s wish was high on my priority list. Not to mention the fact that there are some very good architectural reasons for doing this.
The first place to start is with the basic architecture of the 65000.
Technically, you can encode instructions of the form “instr W(0), W(5)” with these encodings. However, the Register, Register instructions given earlier do this with single-byte opcodes. So you should only use these instructions when actually accessing (non-zero-page) memory operands.
Yay! Almost Done!
The Push/Pop Memory Instructions
StartFragmentPush/Pop Memory
0 1 0 s 1 0 1 1 1 1 1 i m m m m [opcode1][opcode2][disp]
0 PUSH mem
1 POP mem
EndFragment
Addressing Modes & Address Sizes
The 6502’s addressing modes were probably its saving grace. The few instructions available were compensated for by the plethoria of different addressing modes. However, one issue with the 6502’s addressing modes is that they were very unorthogonal. You couldn’t arbitrarily use an addressing mode with any particular instruction. You could, for example, write “LDX ABS,Y” but not “LDX ABS,X” (and conversely, “LDY ABS, X” but not “LDY ABS, Y”). With 8-bit index registers, this probably fell into the category of “things programmers are unlikely to use, so we’ll strip them out." It was truly a breath of relief when then Intel 8086 arrived and you really could use most addressing modes with most instructions.
So the first change is to make the 65000’s addressing modes orthogonal (at least, mostly so).
The second issue is the size of the 6502’s X & Y index registers: only eight bits. On the 65000, I’m going to define the size of X & Y as “whatever the size of an address happens to be.” At the bare minimum, they’ll be 16-bit registers. However, the instruction set will not associate a particular size with these registers. They hold pointers, whatever those are. This will allow the expansion of the 65000 from 16 bits to 32 or 64 bits, as desired, without having to change code.
Page zero (the GPRs) will still hold indirect pointers, just as it did on the 6502. However, as for the index registers, the size of a pointer will not be hard-coded into the architecture. They could be two, three, four, …, up to eight bytes in length.
Comments on various Instructions:
BIT:
The 6502 BIT instruction was clearly intended for quickly testing bits in I/O ports; its use on memory variables was probably a secondary thought. After all, if BIT were truly useful all over the place, why would it be stuck with just zero page and abs addressing modes? Now the concept of a general-purpose bit testing instruction (e.g., TEST on the x86) is genuinely useful. And BIT’s ability to set the N and V flags (along with Z) was brilliant (particularly for I/O). However, there is the issue of the weak addressing mode support.
On the 65000 the BIT instruction becomes a first-class instruction supporting all the addressing modes, including the immediate addressing mode. However, this is where things can get weird. The BIT instruction copies the HO and (HO-1) bits of the source memory operand into the N and V flags. This is cool for the intended purpose (testing I/O ports) but can be less than useful with immediate operands. Of course, with an immediate operand, you could manually set and clear these bits (just like using STV/CLV). Of course, the 65000 already has ANDNPSR and ORPSR instructions, so this is somewhat redundant. However, the BIT instruction with an immediate operand will allow you to set one flag and clear the other simultaneously. If that’s ever useful, this is a cute little programming trick to know.
One thing that always annoyed me about the 6502 BIT instruction: if they were going to copy bits into the flags, why did they stop with the N and V flags? Obviously, the main purpose of the BIT instruction is to manipulate the Z flag, but it leaves the carry flag alone. Why not copy bit zero of the memory operand into the carry flag? While I’m sure someone has written code that expects carry to be left alone by BIT, I do believe that it would have been more useful tweaking the carry flag. So, I’m defining the BIT instruction on the 65000 to do this.
Effective Addresses:
The 65000 provides two instructions, LEX and LEY, for computing effective addresses. The instruction
lex mem
loads the X register with the address of mem, not the value contained at that address. Note that mem represents any legitimate 65000 addressing mode (that could be used, for example, by the LDR instruction).
Note that mem could reference data in any of the three segments: code, data, or page zero. For example, lex code,x loads X with the sum of the X register and the code offset. This is an address in the code segment. Likewise, ley abs, x loads Y with the sum of abs’s offset and the X register. This is an address in the data segment. As a final example, lex R(0),Y loads X with the value contained in Y (which is the sum of R(0)’s address in page zero (0) plus the value of Y). In all three examples you wind up with some pointer value in X or Y, but there is nothing about that pointer to indicate that it is a pointer into the code, data, or zero page segments. It is up to the application to use these addresses as pointers into the proper segment.
While on the subject of segments, at the assembly language level there may be a need to explicitly specify a segment when using a memory addresing mode. Normally, when you use statements like the following:
LDR B(0), varInCodeSeg, X
STR B(0), varInDataSeg, Y
The assembler automatically figures out what segment you’re using by the name of the variables. However, suppose you have the pointers to these variables (not indexes into an array) sitting in X and Y and you want to use the following instructions:
LDR B(0), $0, X ;Fetch data pointed at by X
STR B(0), $0, Y ;Store data to location pointed at by Y
How does the assembler know to encode these as absolute, zero page, or code segment addressing modes? It doesn’t, and the assembler would likely return an error. The solution is to use the “code:” or “data:” prefix on the address to specify those segments (or “R(0)” if you want to specify an address in page zero):
LDR B(0), code:$0, X ;Fetch data pointed at by X
STR B(0), data:$0, Y ;Store data to location pointed at by Y
STR B(0), R(0), y ;Store data in page zero at location pointed at by Y
Operand Sizes
The size of an instruction’s operands is encoded into the operands themselves, not the instruction mnemonic. As noted already, you specify GPR sizes using the B(nn), W(nn), D(nn), and Q(nn) notation for byte, word, dword, and qword types.
If two registers appear as the source and destination locations, their sizes must agree. The following would be illegal:
LDR B(5), W(6)
because the assembler wouldn’t know whether it’s moving a word or a byte. An assembler could relax this “sizes must match” requirement if one of the registers is untyped (that is, uses the “R(nn)” register notation):
LDR B(5), R(6)
because the assembler can infer that the second operand must be a byte register. However, you cannot specify both arguments as untyped values. The following would be illegal:
LDR R(5), R(6)
because the assembler would have no way of knowing what the type would be.
Because index registers are always of type pointer, you should do not need to specify an operand type, nor should you. There is no type information encoded in the instruction and the actual address size is specified by the A1/A0 PSR bits, not the instruction. So you would normally use R(nn) register notation when loading and storing index registers (and SP) with GPRs, e.g.,
LDX R(5)
STY R(0)
Most actual memory operands (beyond page zero GPRs) don’t contribute to the size of the instruction. Actual memory addressing modes (not simple GPR accesses) generally don’t need a type associated with the object address, the following is reasonable:
LDR B(0), R(8), X
because the “B(0)” operand specifies the size, the indexed addressing mode doesn’t have to. An assembler should also be able to infer the size from:
LDR R(0), B(8), X
Note that the PUSH and POP memory instructions require a size specification if not simply pushing or popping a GPR:
PUSH B(8), X
For absolute (and code segment) addresses, some additional syntax is necessary:
PUSH word ptr data:$0, X ;Use MASM syntax!
An ABI for the 65000
An application binary interface specifies conventions for calling functions, passing parameters, register usage, data types, and so on. A chip’s designer, OS vendor, or compiler author usually specifies these things. As I’m the “designer” so to speak, the task falls on me to provide this information.
While I haven’t thought things completely through yet, I’d suggest the following conventions:
GPRs $E0 to $FF are reserved for NMIs
GPRs $C0 to $DF are reserved for interrupts
Note: if interrupts are reenterable, it is up to the ISRs to coordinate the use of the GPRs or to preserve their values across interrupts. Applications must never modify GPRs $C0 to $FF as ISRs may maintain state between interrupts in those locations; due to the nature of interrupts, applications cannot even preserve these values and restore them when they are done (at least, not for NMIs, obviously if you turn off interrupts you could get away with this for IRQs, but the ABI still forbids that).
Note that applications may read the GPRs in the $C0 to $FF range; in fact, this is one of the main ways ISRs communicate with applications.
ISRs can use other GPRs in the system as long as they preserve their values. ISRs may not modify volatile GPRs without preserving them; “volatile” does not apply to ISRs.
GPRs $00 to $1F are volatile, and may be freely modified (without preserving) in a function. These can be used for local variables in a function (though don’t expect them to be preserved across function calls).
Function return results should be returned in GPR(0) (for as many bytes as needed to hold
the function return result).
GPRs $20 to $3F hold parameters passed to a function. Parameter locations should be treated as volatile and moved to local storage inside a function if that function needs to call other functions. Parameters should be packed into the GPRs according to their size with no padding. If 32 bytes are insufficient for the parameters, then all the parameters should be passed on the stack rather than in the GPRs (using C calling convention: first parameter pushed last).
Caller is responsible for cleaning up the stack when a function returns (if parameters were passed on the stack).
GPRs $40 to $7F are non-volatile and should be preserved across a function call.
GPRs $80 to $BF are reserved for global objects. They are also non-volatile and should be preserved across function calls.
The SP should be initialized to the top of the data segment (e.g., LDS #-1).
Pointer sizes are to be set by the caller to match the function’s requirements (that is, the caller must set up the A1/A0 PSR bits if they differ for the caller). The address size bits are considered non-volatile, if a function changes them, it must restore them before returning to its caller. In fact, all the HO bits of the PSR are considered non-volatile and must be preserved across a call if a function changes them.
Displacements
The 6502 encodes two address sizes in its instructions: one-byte “page zero” offsets and two-byte “absolute” offsets. As the 65000 also has a page zero segment and a data (absolute) segment, it would seem that these same offset sizes would work for page zero and absolute addressing modes. The only problem is that this scheme would largely prevent a seamless expansion beyond a 16-bit address bus (which violates the design goals). To overcome this problem, the 65000 is going to use a Hamming-code-based variable-length displacement encoding that allows single-byte, two-byte, or up to eight-byte displacements.
65020
This two instructions complete the instruction set (I think, I could have missed something important). These push and pop memory locations. Note that there are one-byte opcodes that push GPRs, so you should use those rather than these instructions for pushing registers. Use these instructions to push memory. Note that you can push bytes and words (see the “s” bit) as well as dwords and qwords (with a size override prefix byte).
Note: See the discussion of the BIT instruction at the end of this document for details on its behavior with an immediate operand.
StartFragmentDisplacement Ranges
0xxxxxxx -64 to +63
10xxxxxx xxxxxxxx -16384 to +16383
110xxxxx b1 b2 ±2 million
1110xxxx b1 b2 b3 ±268 million
11110xxx b1 b2 b3 b4 ±34 billion
111110xx b1 b2 b3 b4 b5 ±4 trillion
1111110x b1 b2 b3 b4 b5 b6 ±562 trillion
11111110 b1 b2 b3 b4 b5 b6 b7 56 bits
11111111 b1 b2 b3 b4 b5 b6 b7 b7 Full 64 bits.
EndFragment
Notice how this displacement scheme using the most significant ‘0’ bit to determine the size, in bytes, of the displacement value. If a ‘0’ appears in the HO bit, then the remaining seven bits form a signed (two’s complement) value specifying a range of -64 to +63. With the most significant zero in bit 6, the length of the displacement is two bytes forming a 14-bit displacement in the range -16,384 to +16,383. And so on. The following table shows the complete range of displacements:
The 65000 will use a modified Harvard Architecture. In the Harvard Architecture the code and data sit in different memory spaces (accessed via different busses). The 65000 will extend this notion to include a third memory space: Page Zero. A pure Harvard Architecture is pretty rare in general-purpose computer systems (because of the expense of bringing out separate busses). However, it is quite a bit more common in embedded microcontrollers (MCUs), where the code sits in ROM (FLASH) directly on the MCU and, in fact, RAM is often incorporated on the MCU as well. The only slightly unusual thing here is the fact that there are three separate memory spaces (which I’ll call segments).
Note: in the original 6502 the page zero space was literally page zero-the first page (256 bytes) in the memory space. On the 65000, “Page Zero” is separate from the normal data memory. Writing to absolute address $0000 is different from writing to zero page address $00.
Why make this change? Because it doubles the amount of memory available using 16-bit addresses. Remember, memory efficiency is a primary goal in this design. Once addresses grow beyond 16 bits, each pointer or other address is going to consume three, four, or more bytes in memory. That can be expensive. Keeping addresses small is a Good Thing™.
Of course, it will be necessary to be able to access data in the code segment. We fix this issue with a special addressing mode. Also note that we don’t want to force 16-bit-only addressing on applications. In a moment you’ll see how I fix this problem by using a trick taken straight out of the 65816 playbook by using the PSR (program status register).
The Page Zero memory area is similar to page zero on the 6502 insofar is it allows reduced-sized address encodings in instructions. Technically, however, I should probably have named this area the register bank. That’s because, true to Woz’s wishes, I’ve made the page zero memory area the general-purpose register area on the 65000. Yup, it’s got 256 registers!
The cool thing about using the page zero locations as registers is that they constitute an array that can be treated as bytes, word, double-words, quad-words, whatever. Although my initial design is for a 16-bit CPU, the registers are trivially extendable to 32 or 64 bits (or even larger, if you care to dream). You want 64-bit registers? Just grab eight consecutive bytes in page zero and have at it.
So with page zero, you’ve got 256 8-bit registers, 128 16-bit registers, 64 64-bit registers, etc. All without changing the programming model.
And that does it!
I believe I’ve created an instruction set that is largely compatible with the original 6502 philosophy, just allowing the expansion to 16 (or more) bits and a possibly larger address space.
Yup, there are some important instructions missing you’d expect to find on a modern CPU (MUL and DIV immediately come to mind). And an embedded CPU really ought to have more bit manipulation facilities. There are still a few holes in the two-byte opcode map. Worse comes to worst, you could also grab the one single-byte opcode that is left and use that as a prefix for another set of two- or even three-byte opcodes and go wild.
However, my attitude is that if I wanted to create a modern (CISC) processor, I’d totally give up on the 6502 philosophy and look somewhere else. If you want “easy to build”, look at the RISC-V. If you want something that is the epitome of CISC, I’d start with something like the National Semiconductor NS32532.
The bottom line, however, is that this 65000 design is probably going to be a handful to implement (even with FPGAs). So I, personally, wouldn’t dream too much greater than this if I ever wanted to actually implement something.
So I guess I should grab one of those Sparkfun/Alchrity Au+ FPGAs and learn how to program in Lucid. Or at the very least, write a simulator to run on something like a Teensy 4 or Raspberry Pi Pico. Of course, I’d have to write an assembler, disassembler, a loader, and a bunch of other tools, too.
No memory variants!
For architectural reasons, I’ve chosen not to implement read/modify/write instructions on the 65000. This is in direct contrast to the 6502 which supports RMW for a small subset of the instruction set. This was a hard decision to make as these RMW instructions (that is, the shifts and rotates), are popular instructions on the 6502. The bottom line is that I ran out of instruction opcodes and RMW instructions cause a fair amount of grief for CPU designers. On the plus side, zero page (GPR) access is still possible.
There is one “gotcha” with respect to displacements. Although the 65000 is a little-endian machine (just like the 6502) displacements are encoded in big-endian format. This makes it easier for hardware decoding circuitry to pick out the actual displacement bits from the leading size bits.
Except for zero-page accesses (which the 65000 always encodes as a single byte), displacement values are used for address and immediate constant encodings through the instruction set. Though, in the worst case, the displacement encoding could be slightly larger than the actual address or immedate data value, most offsets and constants are much smaller than the maximum size, so this displacement encoding saves space (in the average case).
Consider, for example, the 6502 BEQ statement. This is immediately followed by a single byte specifying an offset of -128 to +127 bytes around the current instruction. That same BEQ on the 65000 would be limited to a range of -64 to +63 bytes around the current instruction. So this looks like a losing proposition. However, consider what happens in the branch is out of range. On the 6502, you have to replace the instruction with a sequence like the following:
bne skip
jmp beqOriginalLabel
skip:
This turns out to consume five bytes of storage (three for the JMP, two for the BNE). The 65000 BEQ however, gracefully expands to a range of -16,384 to +16,387 with a single additional byte. By the time you hit four bytes, you’re well beyond the 16-bit address range of the 6502. So from an instruction size point of view, this is a big win (unless, of course, all your branches are greater than ±64 bytes but less than ±128 bytes).
The one exception to the displacement rule is zero-page offsets. They always fit into a single byte. Therefore, the 65000 encodes them in a single byte.
Note that the 65000 always sign extends the displacement/immediate value to whatever size it needs. So the instruction:
ldx #-1
will be encoded with a single byte displacement (the immediate value) and sign-extended to whatever the size of the X register happens to be.
On reset, the 65000 will begin execution at address $0000 in the code segment (note that this differs from the 6502, which jumps indirectly through the interrupt vector at address $FFFC/$FFFD in memory). An unconditional jump or branch instruction must appear at this location that transfers control to the actual power-on reset code. IRQs, NMIs, BRKs (software breakpoints), and system calls will transfer control through the following vectors:
$0008: 8-byte pointer to IRQ handler.$0010: 8-byte pointer to NMI handler.$0018: 8-byte pointer to BRK handler (software breakpoint instruction handler).$0020: 8-byte pointer to single step handler.$0028: 8-byte pointer to system call handler$0030 to $00FF: reserved for future use.The reset code, therefore, can start anywhere at or after location $100 in memory.
Although the code space is nominally 64k, needing only 16-bit pointers, the vector table reserves 64 bits (eight bytes) for each vector for future expansion of the address space.
The following is a somewhat out of date article (since there are
technical specs on the 65c832 apparently available) about the
65c832 chip. This originally appeared as an article in // Technical
a year or two ago. interesting reading for assembly and hardware
types.
A Report on the 65c832
There have been lots of rumors concerning possible upgrades to the
Apple //GS system. One central theme to an upgrade is an enhanced
65xxx microprocessor. Ever since the introduction of the 65c816
microprocessor, stories concerning its eventual upgrade, the 65c832,
have flourished. Like all tales, the tale of the 65c832 chip has included
fact intertwined with fiction. Someone would say "This is what I wish
the chip would do." People repeating the story translated it to "This is
what the chip will do." and wishes became design goals. For many of us,
the 65c832 is going to be a big disappointment. We've been led to believe
that the 65c832 will do all kinds of wonderful things. When it finally
appears, it will represent a natural evolution of the '816, not the radical
departure that everyone wants. The kind of stuff that big let downs are
made of. However, keep one thing in mind-- the '832 will be completely
upwards compatible with the '816, so no matter what it does
differently, it certainly cannot be worse than the 65c816.
A Little History
The 65c832 microprocessor's origins began over 12 years ago at
Motorola. Motorola's microprocessor division had just released their
6800 microprocessor and they were working on designs for the next
generation chip. A rift developed and a maverick band of engineers, led
by Chuck Peddle, splintered off to form MOS Technologies. Their first
product, the 6501 microprocessor was greeted by a flurry of lawsuits
from Motorola. The 6501 was pin-compatible with the 6800 and
Motorola didn't like the fact that MOS Technologies' parts could be
substituted into a board designed for Motorola parts. So MOS
Technologies introduced an improved version of the 6501, the 6502,
which had improved hardware and a different pin-out. Several
commerically available computer systems employed the 6500 family
microprocessors including the KIM, SYM, AIM, OSI, PET, Atari,
Commodore, and, of course, the Apple I and Apple II. There were
actually a dozen or so different 6500 microprocessors. They all shared
the same instruction set (indeed, the same silicon chip), they differed
mainly in the packaging used to hold the chip. For example, the Atari
2600 VCS system used a 28 pin version of the 6502 called the 6507.
Around August, 1978, one of MOS Technologies' second sources,
Synertek, began circulating specifications for a new 6500
microprocessor called the 6516. This chip was a pseudo-sixteen bit
processor designed to compete with the new Motorola 6809
microprocessor. This chip introduced a few new addressing modes and
several new instructions. Probably the most unique thing about it was
that it used a set processor status register bits to control whether or
not the A, X, and Y registers, or memory operands operated in eight or
sixteen bit mode. The (previously) unused bit in the P register became a
user flag in the 6516. The 6516 sported sixteen-bit accumulator, X, Y,
PC, and SP registers. It also incorporated an eight-bit "Z" register
which controlled the location of the zero page.
In terms of addressing modes, the 6516 supported the following
addressing modes:
- immediate,
- implied,
- register,
- direct page,
- direct page indirect,
- direct page indexed by X,
- direct page indexed by Y,
- direct page indexed by X indirect,
- direct page indirect indexed by Y,
- absolute, absolute indexed by X,
- absolute
- absolute indirect
- absolute indexed by X
- absolute indexed by Y
- 8 and 16 bit relative
The instruction set included all of the 6502's instructions plus LDZ
(STZ), LDS (load SP), LHA (load H.O. A byte), LHX (load H.O. X byte),
LHY (load H.O. Y byte), LAX (load A from location pointed at by X), SAX
(store A at (X)), LAY/SAY (load/store A at (Y)), ADD (no need to clear
carry), SUB (no need to set carry), INC/DEC accumulator, TAZ (init Z
register), TZA (get current Z register value), YPC (transfer Y to PC --
JMP (Y)), PCY (copy current PC into Y), XHA/XHX/XHY (swap A, X,
and Y halves), XXY (exchange values in X/Y registers), SEF/CLF
(set/clear user flag), LDQ (load "Q" processor register with an
immediate value), SEV (set overflow flag), AXA/AYA (add X/Y to A),
AAX/AAY (add A to X or Y), AMX/AMY (add memory to X or Y), NEG
(negate accumulator), several new shift and rotate instructions including
RLT, RRT, ASR, RHL, RHR, RXL, RXR, RYL, and RYR, BFS/BFC (branch if
user flag set/clear), JNE/JEQ (jump long if not equal/equal), PHD/PLD
(push/pop 16-bit A), PHX/PHY/PLX/PLY/PHZ/PLZ (push/pop X, Y, and
Z registers), PHR/PLR (push/pop all registers), BR1..BR5 (five new
BRK/software interrupt instructions).
In addition to the new instructions, Synertek enhanced several old
instructions by adding new addressing modes. They also reduced the
number of cycles needed to execute various instructions, for example,
many implied addressing mode instructions took only one cycle (rather
than two) on the 6516.
After reading over the Synertek technical notes, I immediately
wrote an article for Micro, the 6502 Journal discussing the 6516
microprocessor. The very next month after publication one of
Synertek's representatives wrote a letter to Micro swearing up and
down that there was no such project, never was such a project, and that
I'd made the whole thing up. Funny, I still have in my possession, on
Synertek letterhead, technical notes #34 and #40 which describe the
features of the SY6516 microprocessor.
The SY6516 never saw the light of day. Synertek's representatives
who had come around and shown me the specs for the SY6516 were
simply gauging people's reactions to the chip. Apparently, the reactions
weren't strong enough to forge ahead with the product. An advanced
65xx processor was not forthcoming from Synertek.
Around 1980, MOS Technologies (having long since been bought out
by Commodore) began making noises about a 16-bit upgrade to the 6502
designed to compete with the 68000, Z8000, and 8086 microprocessor.
In the true one-upsmanship style common to semiconductor houses, MOS
Technologies called their chip the 650,000. I glanced over the
extremely tentative specs for the chip. It reminded me of Intel's iAPX-
432 processor so I immediately wrote the chip off. Fortunately for
Commodore's sake, MOS Technologies completely abandoned work on the
chip before it got out of the wish list stage.
It seemed as though the 6502 was destined to be left behind by
semiconductor houses. The introduction of the IBM PC cemented the
8086's future and killed off any hopes for Zilog's Z8000. The Motorola
68000 was hanging in there due to its superior archetecture, and the
introduction of the LISA and MacIntosh computers guaranteed success for
the 68K. Unfortunately, the success of the 68K was almost the final nail
in the 6502's coffin. Almost everyone producing 65xx machines in any
quantity (Apple, Commodore, and Atari) had switched over to the 68K
and were waiting for their 6502 machines to die off.
The 65xx family might have truly died off were it not for one man.
Bill Mensch, one of the original 6502 designers, loved his chip. If he
couldn't get the big companies to design his "6502 dream machine", he'd
start his own company and do it himself. With some layout tape, a couple
of sheets of mylar, and help from his family, Bill laid out a chip that
was a modest improvement over the 6502-- the 65c02. Bill's company,
The Western Design Center, licensed the 65c02 chip to several large
companies including Rockwell (who made some modifications to the
instruction set), GTE, NCR, and VLSI Technologies. Eventually the 65c02
found its way into the Apple //c and the Apple //e ensuring its success.
The 65c02, however, was not what Bill had in mind. It was a
springboard. A revenue producing commodity product a development
company could use to finance more ambitious products. Those ambitious
products were the 65c802 and 65c816 chips. At the time the Western
Design Center (WDC) was designing the 65c816, Zilog was working on a
comparable 16-bit upgrade to the Z80, the Z800. Somewhere around
1984 (I can't remember the exact date), EDN published an article
comparing the work on the 65c816 with the work on the Z800. It was a
David and Golith story. Tiny WDC vs. the giant Zilog. Both companies
were having problems with their chip designs. But it was WDC, laying
out their chip with layout tape and mylar sheets on the kitchen table who
beat Zilog with their fancy CAD/CAE systems to market. Perhaps you
can actually buy a Z800 today, I'm not really sure. One thing's for sure,
with the demise of CP/M there's no market for such a chip.
The 65c816's design was not without it's problems. Bill Mensch
"improved" the bus interface on the 65c816 (over that used by the
6502). Unfortunately, the Apple's disk drive controller relied on some
of the old kludges in the 6502 chip. With those problems removed, the
65c802 and 65c816 chips worked fine on an Apple computer, but the
disk drives didn't work. Of course Apple immediately began laminting
about the stupidity of the designers at WDC and WDC's designers
immediately began complaining about the stupidity of Apple's design. In
the long run, money won out. If WDC wanted Apple to use the '816, WDC
would have to redesign the chip. They did. This exchanged, combined
with the fact that the 65c816 was two years late coming to market, was
the beginning (if not the cause) of an adversarial relationship between
Apple and WDC. There are those at Apple who feel that the schedule of
the 65c816 was one of the major reasons Apple cancelled the ill-fated
Apple //x project.
Eventually, the 65c816 functioned properly and Apple incorporated
it into the Apple //GS. This guaranteed a modicum of success for the
'816 part. There's nothing like a high visbility personal computer to
guarantee a chip's success. This has worked for the Z80 (TRS-80),
6502 (Apple, Atari, Commodore, and numerous others), 8088 (IBM),
and 68000 (Apple, Atari, Commodore, Sun, and others). Chips that have
not lived up to their maker's expectations, like the Z8000 and 32000,
were never adopted by a major personal computer manufacturer. So the
65c816 seems to have everything going for it.
That brings us up to date. Bill Mensch and the Western Design
Center are not resting on their laurels. They've been busy designing
several new microprocessors including the 65c265 a single chip
microcomputer incorporating a 65c815, built-in RAM and ROM, parallel
I/O ports, counter/timers, serial ports, a built-in LAN, and lots of other
goodies. This chip isn't destined for a personal computer, it will find its
way into controller applications like microwaves, stereos, telephones,
and other sophisticated electronic products. These are jellybean type
devices that produce a constant income for their designer. So it only
makes sense that WDC finish the design of these parts.
Unfortunately, the design of parts like the 65c265 takes certain
resources. Since WDC is not a gigantic conglomerate, it has limited
resources. If all your manpower, time, and money are going towards the
development of the 65c265, you don't have any left for the '832. That's
exactly what was happening with the 65c832 as of June, 1988. It's a
concept that WDC employees kick around all the time, but on which active
work has yet to begin. On the positive side, there's still time to
influence WDC's design. On the down side, it will be a couple of years,
at least, before the 65c832 is real.
Some 65c832 Features and Design Phiolosophy
Since the '832 is still at the earliest design phases, there's not a lot
of solid information I can give you concerning the chip. There are some
comments Bill Mensch made at the 65c832 standards conference last
June in Phoenix Arizona that might shed some light on what you can
expect to see.
First of all, don't expect a 16 or 32 bit bus on the 65c832. One of
Bill Mensch's design goals is to produce a chip that is pin compatible with
the 65c816. He wants you to be able to unplug your 65c816 in your
Apple //GS and pop in a 65c832 and continue running old software on
your Apple //GS. This guarantees some compatiblity with existing
hardware, but it definitely limits performance due to bus bandwidth
limitations. Bill mentioned the possibility of a 65032 chip which
supports a full 32-bit data and address bus, but he'd have to be
convinced there is a need for such a part before he would commit to it.
You can also expect to find integer multiply and divide instructions
and probably a set of floating point instructions on the '832. I don't know
a whole lot about chip design, but I do know that floating point
instructions take a lot of effort and silicon to implement. Why do you
think all of the other major manufacturers have gone to separate floating
point coprocessor chips? Indeed, originally the 65c832 was going to be
a floating point coprocessor for the 65c816. Placing the floating point
processor on the chip may cause major design problems (and their
attendent delays) for the 65c832. Hopefully the folks at WDC know
whtat they're getting into and can handle this in stride. By the way, you
can blame/thank Mike Westerfield for WDC's insistence that the floating
point instructions will be on chip. Mike told Bill that he would only
support the floating point instructions if they were on-chip. He wouldn't
support them in his compilers if the 65c832 used a separate FPU chip.
This convinced the folks at WDC that floating point had to be on-chip.
This, probably off-hand, remark from ByteWorks may end up killing the
whole project. Everyone else has had trouble building coprocessors,
much less putting floating point right on the chip. Perhaps WDC can pull
another David and Goliath off and put everything on one chip. However,
I'd rather they played it safe and actually built a 65c832, sans floating
point, rather than go for the gold and give up on the project or go out of
business in the process.
Naturally the 65c832 is going to support full 32-bit registers
everywhere. This includes the A, X, Y, Z, PC, S, DBR, PBR, and D
registers. This means you can place the direct page or stack anywhere
in memory. Furthermore, you will be able to align the program and data
banks on any arbitrary byte boundary. This will greatly enhance
Apple's memory mangement and segmentation techniques. Of course, the
PBR and DBR registers won't be absolutely necessary (since all
addresses are now 32-bits), but they'll still be around for compatibility
with the '816 chip.
WDC will upgrade the 65c816's instruction set using the currently
undefined WDM (William D. Mensch) opcode. Bill Mensch hinted that the
'832 will use this opcode as a prefix byte to other instructions to
change their meaning. The Z80 and 6809 chips sucessfully used this
technique to expand their instruction sets over the 8080 and 6800
microprocessors. This technique has one major drawback- it lengthens
each instruction employing these techniques which, in turn, increases the
amount of execution time necessary for such instructions (by at least
one cycle to fetch the opcode prefix byte). Therefore, native '832
instructions will run slower than comparable '816 native mode
instructions.
Pure relative addressing is another topic Bill has enspoused. On the
'832 you'll be able to write truly relocateable programs. This feature
alone will dramatically affect the loading time and size of application
programs running on an AppleJ//GS. It will also improve memory
management facilities on the GS since the loader and memory manager
wil be able to move relocateable blocks of code around in memory at
execution time. This will dramatically improve the GS' memory manager
garbage collection abilities.
Beyond this, there isn't much I can say about the 65c832 chip.
Addressing modes, instruction types, data types, and other new features
are all up in the air at this point. Of course, if you've got some ideas
about your own 65c816 "dream machine" WDC would love to hear from
you. Jot your ideas down and mail them to:
William D. Mensch
c/o The Western
Design Center
2166 East Brown
Road
Mesa, Arizona,
85203
(602) 962-4545
While you're at it, put in a plug for the 65032. I'd dearly love to see
a true 32-bit Apple II using the NuBus. The 65032 is just the ticket for
such an item.
Keep in mind that WDC isn't the only possible source for an upgraded
65c816 chip. Although unlikely, rumors have it that Apple is designing
an upgrade using gate array technology. Perhaps WDC will have some
competition, who knows? Whatever the case, there's a definite upgrade
path for the Apple II family in the works.