Commits

Anonymous committed af69867

Import of Shelta version 1.1 revision 2009.0307 sources.

  • Participants
  • Parent commits 2a0466d
  • Tags rel_1_1_2009_0307

Comments (0)

Files changed (6)

+bin/sheltan.com: src/shelta86.s
+	nasm src/shelta86.s -o bin/sheltan.com
+
+all: bin/sheltan.com
+

File bin/bootstrp.bat

 @echo off
-REM BOOTSTRP.BAT v2002.1208 (c)1999 Chris Pressey, Cat's-Eye Technologies.
+REM BOOTSTRP.BAT
+REM v1.1.2009.0307 (c)1999-2009 Chris Pressey, Cat's-Eye Technologies.
 REM Builds the bootstrapped versions (S & S2) of the Shelta compiler.
 @echo on
-call bin\shelta 86 prj\sheltas
-copy prj\sheltas.com bin\sheltas.com
-call bin\shelta s prj\sheltas
-copy prj\sheltas.com bin\sheltas2.com
-call bin\shelta s2 prj\sheltas
-diff prj\sheltas.com bin\sheltas2.com
-del prj\sheltas.com
+call bin\shelta n eg\sheltas
+copy eg\sheltas.com bin\sheltas.com
+call bin\shelta s eg\sheltas
+copy eg\sheltas.com bin\sheltas2.com
+call bin\shelta s2 eg\sheltas
+del eg\sheltas.com

File bin/shelta.bat

-@echo off
-REM SHELTA.BAT v2002.1208 (c)2002 Cat's-Eye Technologies.
+REM @echo off
+REM SHELTA.BAT
+REM v1.1.2009.0307 (c)1999-2009 Chris Pressey, Cat's-Eye Technologies.
 REM A 'make'-like utility for Shelta compilers, as an MS-DOS batch.
 
 REM -- Change the following lines to tailor what libraries are
 REM -- included by default.  See readme.txt
-type lib\8086\8086.she >s
-type lib\8086\gupi.she >>s
-type lib\8086\dos.she >>s
-type lib\8086\string.she >>s
-type lib\gupi\linklist.she >>s
+type lib\8086\8086.she >s.she
+type lib\8086\gupi.she >>s.she
+type lib\8086\dos.she >>s.she
+type lib\8086\string.she >>s.she
+type lib\gupi\linklist.she >>s.she
 
-REM -- This section builds the source file, always called 'S'.
+REM -- This section builds the source file, always called 's.she'.
 if not exist %2.she echo Can't find project file %2.she!
-if exist %3.she type %3.she >>s
-if exist %4.she type %4.she >>s
-if exist %5.she type %5.she >>s
-if exist %6.she type %6.she >>s
-if exist %7.she type %7.she >>s
-if exist %8.she type %8.she >>s
-if exist %9.she type %9.she >>s
-if exist %2.she type %2.she >>s
-type null.txt >>s
+if exist %3.she type %3.she >>s.she
+if exist %4.she type %4.she >>s.she
+if exist %5.she type %5.she >>s.she
+if exist %6.she type %6.she >>s.she
+if exist %7.she type %7.she >>s.she
+if exist %8.she type %8.she >>s.she
+if exist %9.she type %9.she >>s.she
+if exist %2.she type %2.she >>s.she
+type bin\null.txt >>s.she
 
-bin\shelta%1.com <s > %2.com
+rem bin\shelta%1.com <s.she
+bin\shelta%1.com <s.she >%2.com
 
 if errorlevel 32 echo Source file could not be opened.
 if errorlevel 16 echo Error - Unknown identifier in source file.
-del s
+del s.she

File bin/sheltan.com

Binary file added.

File doc/nasm2009.txt

+Well, here we are, ten years later.
+
+What brought me back here was the fact that no one uses Turbo Assembler
+anymore.  I'm not even sure if Borland is around anymore.  And I started
+thinking, well, I use NASM these days; maybe I should translate the 8086
+assembly version of Shelta into NASM.  It's free, and tinkers like free.
+Plus Ben recommended it, way back when it was something like version 0.98.
+I said I'd wait until it was past version 1.0.  Well, it's 2.mumble now, 
+so it's high time, right?
+
+So I started translating, and I discovered just how much more explicit
+NASM is.  I was aiming to reproduce the same SHELTA86.COM file, or at
+least one of the same length with the labels in the same places.  I had
+to go through some lengths to stop NASM from inserting redundant ds:
+segment references, and from padding the start of the data segment to a
+word boundary (I just left out the data segment directive entirely.)
+
+But then, when I got it all nice and translated -- Shock!  Horror!
+I discovered the awful truth: shelta86.com cannot compile sheltas.she.
+
+Where did I get the nerve to say that I had bootstrapped a half-kilobyte
+compiler?  Misleading at best!  I had bootstrapped a probably about 555-
+byte compiler.  I then butchered the language it compiled -- removing
+three instruction forms -- so that I could shove that compiler into half
+a kilobyte.  At no time did I actually bootstrap the <512-byte version.
+No, that would have required rewriting shelta.she to have a lot of blocks
+with temporary names that were only pushed once, and other garbage like
+that, so that the stripped-down compiler wouldn't choke on it.  One of
+the nice things about (the full) Shelta, I think, is that while it is
+small, it doesn't force you to wallow in garbage.  At least not a lot.
+
+So I screwed up my courage, cracked my knuckles, and tried to live up to
+my own hype.  I re-instated the string (`) and push-pointer-anonymously
+(]) functions, which bumped the size back up to around 555 bytes.  I
+didn't bother with the push-named-pointer (]Name) form, because it's not
+used in sheltas.she.  OK, so neither are strings, but a lot of the other
+example Shelta programs use strings, so I thought they would be good to
+have.
+
+I then proceeded to squish the living daylights out of the new, NASM-
+language shelta86.s.  Mostly this involved long, hard looks at the logic
+and detailed liveness analysis (done by hand, of course.)  There were a
+few small tweaks that were easily done; for example, removing one or two
+instructions that were completely unnecessary, and replacing the jmp
+in the handler dispatch with a call (several ret statements take up less
+space than several jmps back to the top of the loop.  Who cares about
+wasting space on the stack?)  The most significant savings, though, came
+from factoring out some code to write a push instruction and calling an
+existing routine for it instead, and from shuffling registers to keep dx
+free long enough so that it, instead of a memory location, could be used
+to store one of the crucial computed pointers.  The result: a 509-byte
+executable which did all that the old shelta86.com could do *and* enough
+more to actually compile sheltas.she!
+
+The old shelta86.com is still in the distribution, for comparison, or
+nostalgia, or completeness, or whatever.  The new executable is called
+sheltan.com (for Shelta in NASM, I suppose.)  The bootstrapping and
+driver scripts have been changed slightly to accomodate this newfangle-
+ness.  I haven't touched the other documentation, which is now slightly
+inaccurate but still quite useful.
+
+Happy bootstrapping!
+Chris Pressey
+March 7, 2009
+Bellevue, WA

File src/shelta86.s

+;  shelta86.s
+;  v1.1.2009.0307
+;  (c)2009 Chris Pressey, Cat's Eye Technologies.
+
+;  Implements an assembler/compiler for the Shelta language, in 8086 machine
+;  language, in the format of the NASM assembler.
+
+;  * Special thanks to Ben Olmstead (BEM) for his suggestions for how to
+;    reduce SHELTA86.COM's size even further.
+
+org             0100h
+bits            16
+cpu             8086
+
+;-------------- Code
+
+; Main program.
+
+WhileFile:
+
+; ----- begin scanning token
+
+                call    word ScanChar   ; get char -> al
+                or      al, al
+                jz      EndFile
+                cmp     al, 32
+                jbe     WhileFile       ; repeat if char is whitespace
+
+                mov     di, token
+                cld
+
+.TokenLoop:     stosb                   ; put char in token
+                call    word ScanChar   ; get char
+                cmp     al, 32
+                ja      .TokenLoop      ; repeat if char is not whitespace
+
+                mov     byte [di], 0    ; null-terminate the token
+
+; ----- end scanning token
+
+                mov     si, token + 1
+
+                mov     al, [token]
+                sub     al, '['
+                cmp     al, 5
+                ja      .Unroll
+                xor     ah, ah
+                shl     ax, 1
+                xchg    bx, ax
+                mov     ax, [ttable + bx]
+                call    ax              ; call handler as listed in ttable
+                jmp     short WhileFile
+
+.Unroll:        dec     si              ; start at first character of token
+                call    word LookupSymbol ; destroys DI & SI, but that's OK
+
+                ; copy cx bytes from ax to codeh
+
+                xchg    ax, si
+                mov     di, [codeh]     ; use di to track codeh
+                rep     movsb
+
+UpCodeH:        mov     [codeh], di
+                jmp     short WhileFile
+
+EndFile:        ; put in a jump over the safe area
+
+                mov     di, token       ; re-use token
+                mov     al, 0e9h
+                stosb
+                mov     ax, [safeh]
+                sub     ax, safe - 1
+                stosw
+                mov     al, 090h
+                stosb
+
+                mov     cx, 4
+                mov     dx, token
+                call    word WriteIt
+
+                ; make the first word of the safe area an offset
+                ; to just past the last word of the code 
+
+                mov     cx, [safeh]
+                mov     dx, safe
+                sub     cx, dx
+                mov     ax, cx
+                add     ax, [codeh]
+                sub     ax, codeadj
+                mov     [safe], ax
+
+                call    word WriteIt
+
+                mov     cx, [codeh]
+                mov     dx, code
+                sub     cx, dx
+                call    word WriteIt
+                
+                xor     al, al
+
+GlobalExit:     mov     ah, 4ch         ; exit to DOS
+                int     21h
+
+WriteIt:
+                mov     ah, 40h
+                mov     bx, 1
+                int     21h
+                jnc     .OK
+                mov     al, 32
+                jmp     short GlobalExit
+.OK:            ret
+
+; -------------------------------- HANDLERS --------------------------- ;
+; When coming into any handler, di will equal the address of the null
+; (that is, the number of characters in the token + offset token)
+
+; ==== [ ==== BEGIN BLOCK ==== ;
+
+BeginBlock:     mov     di, [stach]     ; push [ onto stack
+                mov     ax, [codeh]
+                stosw                   ; mov   [bx], ax
+                mov     [stach], di
+                ret
+
+; ==== ] ==== END BLOCK ==== ;
+
+EndBlock:       dec     di              ; di left over from scanning token
+
+                mov     bx, di          ; di now free to hold something until .WName
+                sub     bx, si          ; get length of token
+                mov     [toklength], bx ; store it for later
+
+                mov     ax, [safeh]
+                mov     [safestart], ax
+                mov     dx, ax          ; dx = namestart initially = safestart = safeh
+                xchg    ax, di          ; di now holds safe area head location
+
+                sub     word [stach], byte 2
+                mov     bx, [stach]     ; pop [ from stack
+                mov     ax, [bx]        ; ax = codeh when [ happened
+
+                mov     bp, [codeh]     ; find length
+                sub     bp, ax          ; bp = length of code between [ ... ] (codeh - old codeh)
+
+                cmp     word [stach], stac
+                je      .StackEmpty
+
+                mov     cx, [bx - 2]     ; cx = contents popped from stack
+
+                ; namestart:dx = namestart:dx - (contents:cx - tokenlength:ax)
+
+                sub     cx, ax
+                sub     dx, cx
+
+.StackEmpty:    cmp     byte [si], ':'  ; si still = offset token + 1
+                jne     .PreCopy
+
+                mov     di, [macrh]     ; copy into macro area instead of safe area if :
+                mov     dx, di
+
+                ; copy everything from ax to codeh into the di area
+
+.PreCopy:       push    ax
+                mov     cx, bp
+                push    si
+                xchg    si, ax
+                rep     movsb
+                pop     si
+                pop     ax
+
+                ; restore codeh back to old codeh before [
+
+                mov     [codeh], ax
+                cmp     byte [si], ':'  ; si still = offset token + 1
+                jne     .UpdateSafe
+
+                mov     [macrh], di
+                jmp     short .NameIt
+
+.UpdateSafe:    mov     [safeh], di
+
+                ; write push instruction if '=' or ':' not used
+
+                cmp     byte [si], '='  ; si still = offset token + 1
+                je      .NameIt
+
+                mov     ax, [safestart]
+                sub     ax, safeadj
+
+                mov     di, [codeh]      ; di no longer contains macrh/safeh
+                jmp     short WritePush
+
+                ; insert namestart into dictionary
+
+.NameIt:        mov     cx, dx
+                mov     ax, [toklength]
+
+                inc     si
+
+.WName:         ; Insert token into the symbol table.
+                ; DESTROYS: DI
+                ; INPUT:    si = pointer to token text
+                ;           ax = length of token text
+                ;           cx = pointer to data associated with token
+                ;           bp = length of data associated with token
+
+                mov     di, [symth]     ; di no longer contains macrh/safeh
+                add     ax, 6           ; 1 word for length, 1 for ptr, 1 for data length
+
+                stosw                   ; place ax length in symt
+
+                sub     ax, 6
+                xchg    cx, ax          ; cx <- ax; ax <- cx
+                stosw                   ; place cx (ptr to data)
+                xchg    ax, bp          
+                stosw                   ; place bp (ptr length)
+
+                rep     movsb
+
+                mov     [symth], di
+
+                ret
+
+; ==== ^ ==== PUSH POINTER ==== ;
+
+PushPointer:    call    LookupSymbol    ; destroys di & si, should be OK
+
+                sub     ax, safeadj
+                mov     di, [codeh]
+                jmp     short WritePush
+
+; ==== ` ==== STRING ==== ;
+
+String:         mov     di, [codeh]
+.Loop:          mov     al, [si]
+                stosb
+                inc     si
+                cmp     byte [si], 0
+                jne     .Loop
+                mov     [codeh], di
+                ret
+
+; ==== _ ==== LITERAL BYTE ==== ;
+
+LiteralByte:    cmp     byte [si], '_'
+                je      LiteralWord
+                cmp     byte [si], '^'
+                je      LiteralSymbol
+                call    DecipherDecimal ; sets DI to [codeh]
+                jmp     short GnarlyTrick
+
+; ==== __ ==== LITERAL WORD ==== ;
+
+LiteralWord:    inc     si
+                call    DecipherDecimal ; sets DI to [codeh]
+FunkyTrick:     stosw
+                jmp     short CheapTrick
+
+; ==== _^ ==== LITERAL SYMBOL ==== ;
+
+LiteralSymbol:  inc     si
+                call    LookupSymbol    ; destroys DI & SI, that's OK
+
+                sub     ax, safeadj
+
+                mov     di, [codeh]
+                jmp     short FunkyTrick
+
+; ==== \ ==== PUSH WORD ==== ;
+
+PushWord:       call    DecipherDecimal ; sets DI to [codeh]
+
+WritePush:      mov     byte [di], 0b8h ; B8h, low byte, high byte, 50h
+                inc     di
+                stosw
+                mov     al, 50h
+GnarlyTrick:    stosb
+CheapTrick:     mov     [codeh], di
+                ret
+
+; -------------------------------- SUBROUTINES --------------------------- ;
+
+DecipherDecimal:
+                ; INPUT: si = address of token
+                ; OUTPUT: ax = value, di = codeh
+                ; uses and destroys DI
+
+                xor     di, di
+
+.Loop:          lodsb
+
+                mov     bx, di
+                mov     cl, 3
+                shl     bx, cl
+                mov     cx, di
+                shl     cx, 1
+                add     bx, cx
+
+                sub     al, '0'
+                cbw
+                add     bx, ax
+                mov     di, bx
+
+                cmp     byte [si], '0'
+                jae     .Loop
+
+                xchg    ax, di
+                mov     di, [codeh]
+                ret
+
+; Scans a single character from the input file, placing
+; it in register al, which will be 0 upon error
+; or eof (so don't embed nulls in the Shelta source...)
+
+ScanChar:
+                mov     ah, 7           ; read from stdin one byte
+                int     21h
+                cmp     al, ';'         ; check for comment
+                je      .Comment
+                ret
+.Comment:       mov     ah, 7           ; read from stdin one byte
+                int     21h
+                cmp     al, ';'         ; check for comment
+                jne     .Comment
+                jmp     short ScanChar
+
+LookupSymbol:
+                ; INPUT:  si = address of symbol to find, di = address of null termination
+                ; OUTPUT: ds:ax = pointer to contents or zero if not found
+                ; cx = length of contents
+
+                mov     bx, symt        ; bx starts at symbol table
+                mov     bp, si
+                sub     di, si
+
+.Loop:          mov     ax, [bx]        ; first word = token size
+
+                mov     dx, bx          ; keep track of start of this symt entry
+
+                sub     ax, 6
+                cmp     ax, di
+                jne     .Exit           ; if it doesn't fit, you must acquit
+
+;   exit if right token
+
+                xor     si, si          ; reset si to token
+.Inner:         mov     al, [bx + 6]    ; get byte from bx+6=pointer to token text
+                cmp     [bp + si], al   ; compare to si=token
+                jne     .Exit
+                inc     bx
+                inc     si
+                cmp     si, di          ; hit the length yet?
+                jb      .Inner          ; no, repeat
+
+                ;   a match!
+
+                mov     bx, dx
+                mov     cx, [bx + 4]    ; third word = data length
+                mov     ax, [bx + 2]    ; second word = data ptr 
+                ret
+
+.Exit:          mov     bx, dx
+                mov     ax, [bx]
+                add     bx, ax
+                cmp     bx, [symth]
+                jb      .Loop
+
+                mov     al, 16          ; return 16 if unknown identifier
+                jmp     GlobalExit
+
+;-------------- Initialized Data
+
+symth:          dw      symt
+codeh:          dw      code
+stach:          dw      stac
+safeh:          dw      safe + 2
+macrh:          dw      macr
+
+ttable:         dw      BeginBlock, PushWord, EndBlock, PushPointer, LiteralByte, String
+;                       [           \         ]         ^            _            `
+
+;-------------- Uninitialized Data
+
+section .bss
+
+token:          resb    128
+
+safestart:      resw    1
+toklength:      resw    1
+
+safe:           resb    16384
+symt:           resb    16384   ; 16K + 16K = 32K
+code:           resb    4096
+macr:           resb    4096    ; + 8K = 40K
+stac:           resb    256
+
+;-------------- Equates
+
+safeadj         equ     (safe - 0104h)
+codeadj         equ     (code - 0104h)