Navigation
Home & news
Random page
All pages
Site search:
Databases
Fortune cookies
Haikus
SID themes
Page collections
Blag
Chip music
Chipophone
Games
Hardware projects
Music downloads
Obfuscated programming
Piano music
Sane programming
Scene productions
SID related pages
Software downloads
Video downloads
Featured pages
A Chipful of Love For You
Autosokoban
Beagleboard VGA
Binary Art
Brainfuck
Chipophone
Chopin romance
Craft
Elements of Chip Music
Fratres
GCR decoding on the fly
Hardsync
Kernighan's lever
Klämrisk Hero
Live at LCP 2011
Obsessbook
Parallelogram
Phasor
Pipe Logic
Poems for bugs
Rasp64
Reverberations
Scene Spirit
Sidreloc
Specular Highlight
Spindle
TTY demystified
We learn the nibbles
Vim + ^Z
Zeugma
Don't miss

Delta
Forum
Register
Log in
Latest comments
Syndication
RSS feed
Feedback
  • Swedish content
  • Personal content
  • Offensive content

Voyage, travel, and change of place impart vigor.

— Seneca the Younger, ca 4 BC – 65 AD

Sidreloc

Sidreloc is a tool for relocating SID tunes any number of whole pages, using a novel but remarkably simple algorithm. It can also relocate all zero-page variables used by the tune.

Sidreloc is capable of relocating 91% of HVSC #56 out of the box. Many of the remaining tunes can be relocated by tweaking the settings to fit them.

Download

Here's the manual page for sidreloc.

While the tool is SID specific, the fundamental algorithm could easily be adapted to relocate all sorts of 6510/6502 machine code that meets a couple of requirements (see "Observations" below). Sidreloc is primarily a linux program, but it's open source (MIT license) and you are encouraged to port it to different operating systems. Good patches will be merged into the official version.

Background

SID tunes are music files for the Commodore 64 home computer. They are typically 3-4 KB large, and consist of a small header followed by executable 6510 machine code. The header contains a 16-bit load address; the code must be loaded at this address, and then executed. This can be done in an emulator running on a modern computer (e.g. sidplay) or on a real C64.

Of the over 40000 SID tunes found in the excellent HVSC, many have been ripped from games or demos, and the original source code is long lost. Some tunes were written in assembly language, and then linked to run at a particular address. Others were programmed in machine code monitors, where addresses are hardcoded from the start. Yet others were made using interactive tools such as trackers, but then exported into executable code linked to run at some given load address.

The following charts show a breakdown of all SID tunes in the HVSC based on load address and size.

(Note: BASIC tunes have been omitted here and in the following.)

Unfortunately, it is sometimes difficult to combine an existing SID tune with other code, such as accompanying a new demo effect with a historical SID tune. Demo effects often rely on bank switching and other address-related trickery, and need to occupy certain memory regions. On the other hand, SID tunes are just programs, and should be able to execute from any place, if it weren't for the irritating little fact that they're already linked.

The problem

Once linked, it is generally very difficult to relocate (move) a piece of machine code. This is especially true for 6510/6502 code. To illustrate, here are two small routines that implement the same functionality: Copying a 2 KB chunk of memory from the current program into the area $3800-$3fff. This is something you might do in order to set up a font for a C64 program.

Self-modifying version:

        * = $1000       ; This informs the assembler that we'll be loaded at $1000

        ldy     #8              ; 8 iterations of the outer loop
        ldx     #0              ; 256 iterations of the inner loop
loop
        lda     font,x          ; Copy one byte...
        sta     $3800,x         ; ...into the right place.
        inx
        bne     loop

        dey
        beq     done

        inc     loop+2          ; For each iteration of the outer loop, modify the load
        inc     loop+5          ; and store addresses.
        jmp     loop
done
        rts

font
        ; 2 kb of data follows...

If you are unfamiliar with 6510 code, this might seem utterly terrifying, but it's standard practice. The index registers (X and Y) are 8 bits wide, and there's no obvious way to loop efficiently over a large array of bytes without resorting to self-modifying code.

This is what we get if we run the code through an assembler and a linker, and then a disassembler:

1000 a0 08      ldy #$08
1002 a2 00      ldx #$00
1004 bd 1a 10   lda $101a,x
1007 9d 00 38   sta $3800,x
100a e8         inx
100b d0 f7      bne $1004
100d 88         dey
100e f0 09      beq $1019
1010 ee 06 10   inc $1006
1013 ee 09 10   inc $1009
1016 4c 04 10   jmp $1004
1019 60         rts
101a ...

The branches are relative, but the jmp uses an absolute address which depends on where in memory the program is located. To relocate this code, we must change the operand of the jmp instruction. When relocating a whole number of pages, we only need to change the most significant byte (MSB) of the jmp address, i.e. the $10 byte at $1018. For similar reasons, we need to modify the bytes at $1006, $1012 and $1015. However, we should not change the byte at $1009, because the font should still be copied into $3800.

Note, by the way, that all the changed bytes happen to have the value $10. This is because all of the code fits into one page of memory. If the code were bigger, the font data would be pushed into the next page, and the byte at address $1006 would be $11, but we'd still need to change it.

So far, all of the addressing modes were based on literal 16-bit addresses. But now consider the following code:

Explicit pointer version:

        * = $1000

        lda     #<font          ; a0/a1 keep track of the reading position.
        sta     $a0
        lda     #>font
        sta     $a1

        lda     #$00            ; a2/a3 keep track of the writing position.
        sta     $a2
        lda     #$38
        sta     $a3

        ldy     #0		; Y will remain zero throughout the routine.
loop
        lda     ($a0),y         ; Copy a byte.
        sta     ($a2),y

        lda     $a0             ; Increment the reading pointer.
        clc
        adc     #$01
        sta     $a0
        lda     $a1
        adc     #$00
        sta     $a1

        lda     #$01            ; Increment the writing pointer.
        clc
        adc     $a2
        sta     $a2
        lda     #$00
        adc     $a3
        sta     $a3

        lda     $a0             ; Check if we're done.
        cmp     #<fontend
        bne     loop
        lda     $a1
        cmp     #>fontend
        bne     loop

        rts
font
        ; 2 kb of data follows...
fontend

This routine is not particularly fast, but at least it's not self-modifying. However, it turns out to be much more tricky to relocate.

This is what the disassembled code looks like:

1000 a9 3d      lda #$3d
1002 85 a0      sta $a0
1004 a9 10      lda #$10
1006 85 a1      sta $a1
1008 a9 00      lda #$00
100a 85 a2      sta $a2
100c a9 38      lda #$38
100e 85 a3      sta $a3
1010 a0 00      ldy #$00
1012 b1 a0      lda ($a0),y
1014 91 a2      sta ($a2),y
1016 a5 a0      lda $a0
1018 18         clc
1019 69 01      adc #$01
101b 85 a0      sta $a0
101d a5 a1      lda $a1
101f 69 00      adc #$00
1021 85 a1      sta $a1
1023 a9 01      lda #$01
1025 18         clc
1026 65 a2      adc $a2
1028 85 a2      sta $a2
102a a9 00      lda #$00
102c 65 a3      adc $a3
102e 85 a3      sta $a3
1030 a5 a0      lda $a0
1032 c9 3d      cmp #$3d
1034 d0 dc      bne l1012
1036 a5 a1      lda $a1
1038 c9 18      cmp #$18
103a d0 d6      bne l1012
103c 60         rts
103d ...

To relocate this, our algorithm needs to be able to figure out that the operand of the second lda (the byte at $1005) should be relocated, because it will be stored into $a1 and later end up as the MSB of an address, as part of an indirect addressing mode. But indirect addressing could just as well be performed on e.g. addresses that have been computed as the sum of a relocatable base address and a non-relocatable offset, as in the subsequent iterations of the loop. Furthermore, the operand of one of the cmp instructions (the $18 byte at $1039) must also be relocated, otherwise the loop won't terminate properly.

This second example also illustrates another peculiarity of 6510 code: Zero-page addresses are not only fast, but also required for indirect addressing. Hence SID tunes tend to use at least a couple of them. On the C64, the four bytes at $fb-$fe are documented as "Free Zero Page space for User Programs", so they get used a lot, increasing the probability of a collision if we try to combine an existing SID tune with some other code.

Thus we also need to be able to relocate zero-page accesses. Like regular pointers, zero-page addresses may be computed and stored into future instruction operands as part of self-modifying code.

So far, we have seen that pointers may appear inside 16-bit operands, but also in separate 8-bit operands spread out all over the code. Finally, pointers may appear in data. SID tunes typically contain tables of pointers to data structures called patterns. These pointers are either stored as regular words in little-endian order, or split into one LSB table and one MSB table.

Observations

The approach taken in sidreloc hinges on the following crucial observations:

  • SID tunes have a well-defined API: We're supposed to load the tune into memory, call an initialisation routine once, then call a playroutine regularly at some rate (e.g. the video frame rate). Some SID tunes also install a secondary timer interrupt which is called very rapidly in order to play samples ("digis").

    All of this can be emulated, and we can make reasonable assumptions about how many calls we need to make to these routines in order to exercise all of the code.

  • In practice, it is enough to be able to relocate the tune a whole number of pages in either direction. Thus we only need to be concerned with address MSBs.

  • The MSB of almost every pointer that is computed in a 6510 program is the sum of:

    • one or more program bytes (bytes that appear verbatim somewhere in the linked machine code), of which exactly one is the MSB of an address, and
    • possibly some offset that is independent of the program location (such as a computed index into a table).
  • The output of a relocation algorithm is a set of relocation flags, one for each program byte, expressing whether or not the relocation offset (in pages) should be added to this particular byte.

    (The same principle applies to zero-page relocation, but some extra book-keeping is needed; check the source code for details.)

  • It is not possible to write a perfect algorithm that works for every SID tune, because a playroutine could theoretically e.g. read out the current program counter and store it into a critical SID register like $d404. Being able to relocate most SID tunes is good enough.

  • We can easily check whether the algorithm worked for a particular SID tune by emulating the original code and the relocated code side by side, and comparing the contents of the SID registers after each call to the playroutine.

The last two observations taken together also imply that we don't have to be conservative in what we do; if we're unsure about something, we make an educated guess that is correct in most cases, and later check if it was true for this particular tune.

The algorithm

The trick is to emulate a 6510 processor and a 64 KB memory, but inside every memory cell (including the registers) we not only track the current 8-bit value, but also the list of program bytes which contributed to this value. Whenever an effective address is computed inside the emulated processor, we check whether this address is in the relocation range (e.g. the extent of the program), and if so, we note down the fact that exactly one of these program bytes must be relocated.

When a byte is copied e.g. from memory into a register, not only is the value copied, but the list of contributing program bytes is copied along with it. When a sum is computed (adc instruction), the result is not only the computed sum, but also the concatenation of the lists of program bytes contributing to both operands. When other computations are performed, the resulting value is associated with an empty list of contributors.

Furthermore, every time a comparison is made (using cmp or eor), we note down that the two lists of program bytes contributing to the operands must have equal relocation status: Either they're both independent of the program location, or they're both dependent of the program location. In the latter case this implies that each list must contain exactly one program byte which is relocated.

Having visited all parts of the code, we thus end up with a collection of facts (equations) relating the relocation flags of program bytes to each other. Then we use a constraint solver to find a solution to this set of equations. Flags of unknown status (corresponding to program bytes which do not appear in any equation) are treated as "don't relocate".

Zero-page

For zero-page relocation, we also need to keep track of which program bytes contribute to which zero-page addresses. We need a bitfield for this, since a program byte often points to more than one zero-page variable: The operand of lda ($a0),y contributes to the effective addresses $a0 and $a1.

We still get equations describing lists of program bytes of which exactly one should be relocated.

Once we have a solution to the set of equations, we visit each program byte that was flagged for relocation. For regular pointer MSBs, we add the main relocation offset. For zero-page addresses, we add the relocation offset for those zero-page addresses to which the program byte contributed. For this to work, some zero-page addresses must of course move together.

Example

When sidreloc is run with the verbose option (-v), it prints a map of all the program bytes. The character R indicates that the relocation offset (difference in pages between the old and new code locations) has been added to this byte. The character Z indicates that some zero-page relocation offset has been added to this byte.

Here's the output you get when relocating Zoids to page $42, with a free zero-page region starting at $80. If you are familiar with Hubbard's playroutine, you'll immediately recognise the pattern pointers MSB table (a large stretch of consecutive R bytes) and the subtune tables (one "===RRR" chunk per subtune, consisting of three track pointers). The first third of the file is code, and the rest is mostly data.

Zoids, Rob Hubbard, 1986 Martech, $1000-$1a7f, 3 subtunes
Relocating from $1000-$5aff to $4200-$8cff
Analysing subtune 1
Analysing subtune 2
Analysing subtune 3
Program map:
1000, 4200:  ==R==R==R==R========R====R==R==R==R=====R==R===?==========?====?
1040, 4240:  ==R==R====R====R==R==R==R===R==R====R=Z==R=Z==R====R...==R==R=Z=
1080, 4280:  ?===?====R==R====R==R==R==R...===R=Z==R=Z=?==R==R=?==R=Z==R==R=?
10c0, 42c0:  ==R=?===R=?===?=====R====R==R====Z====R==R==R==R==Z==R====R==R==
1100, 4300:  R==R=====R==R=====R==R==R==R==R==R======R==R==R==R=====R=====R==
1140, 4340:  ===R=====R=====R==R==R==R==R=Z=?======R==R==R==R==R=?====R====R=
1180, 4380:  ?====?========R======R==R==R==R==R==R==R====R=?=?===?==R==R=====
11c0, 43c0:  R==R==R==R==R===R==R====R==R==R==R==R==R=?=?====R======R==R==R==
1200, 4400:  R==R==R==R==R==R=====R=====R=?====R==R==R==R==R=====R==R====R=?=
1240, 4440:  =R=.==R==R=?==R==R====R===R===R=?=?==?====R==R===R==R===R=?=?==?
1280, 4480:  ====R==R==R===R======R=====R==R==R===?==R==R=?=====R==R==R=====R
12c0, 44c0:  =?==R=====R===R==R==R=====R=?==R=====R=?====R====R====R=?==?==R=
1300, 4500:  =R====R==R=====R=?====R====?=====R=?====R=?======R======R=?====R
1340, 4540:  =.==R==R=====R=?====R=?====R==?==R==R====R==R==R==R=====R=======
1380, 4580:  =R=........................??............????..????????..????...
13c0, 45c0:  .????..?????????????????????????????????????????????????????????
1400, 4600:  ?????????????????????????????????????????????............??....?
1440, 4640:  ?..===???????????????????????????????????..?..??????????????????
1480, 4680:  ????????????????????????????????????????????????????????????????
14c0, 46c0:  ??????????????????????........????????................??????===R
1500, 4700:  RR===RRR===RRR===============================RRRRRRRRRRRRRRRRRRR
1540, 4740:  RRRRRRRRRRRR================================================?===
1580, 4780:  ================================================================
15c0, 47c0:  ================?===============================================
1600, 4800:  =====================================?==.==.==?==.====.====?????
1640, 4840:  ????????????????????????????????????????????????????????????????
1680, 4880:  ????????????????????????????????????????????????????????????????
16c0, 48c0:  ????????????????????????????????????????????????????????????????
1700, 4900:  ????????????????????????????????????????????????????????????????
1740, 4940:  ????????????????????????????????????????????????????????????????
1780, 4980:  ????????????????????????????????????????????????????????????????
17c0, 49c0:  ????????????????????????????????????????????????????????????????
1800, 4a00:  ????????????????????????????????????????????????????????????????
1840, 4a40:  ????????????????????????????????????????????????????????????????
1880, 4a80:  ????????????????????????????????????????????????????????????????
18c0, 4ac0:  ????????????????????????????????????????????????????????????????
1900, 4b00:  ????????????????????????????????????????????????????????????????
1940, 4b40:  ????????????????????????????????????????????????????????????????
1980, 4b80:  ????????????????????????????????????????????????????????????????
19c0, 4bc0:  ????????????????????????????????????????????????????????????????
1a00, 4c00:  ?????????????????????????????????????????????????=====R====R===R
1a40, 4c40:  ==R===?===?=============?====?==R==?==R=........................
MSB relocations       (R): 233
Zero-page relocations (Z): 9
Static bytes          (=): 969
Status undetermined   (?): 1349
Unused bytes          (.): 128
Old zero-page addresses:   fb fc fd fe
New zero-page addresses:   80 81 82 83
Verifying relocated subtune 1
Verifying relocated subtune 2
Verifying relocated subtune 3
Bad pitches:               0, 0%
Bad pulse widths:          0, 0%
Relocation successful.
Largest unused region:     $4d00-$9fff

Limitations

There are a couple of situations where the algorithm fails.

Unrelated memory

This concerns code which accesses memory completely unrelated to the program location. For instance, some SID tunes decrunch data into $e000-$ffff. The relocated tune works great, but it will also access this memory; hence, it can't coexist with demo effects or other tunes which happen to use the same area of memory for data. Sidreloc prints warnings about out-of-bounds memory accesses.

In most cases, such a situation can be resolved by relocating in multiple steps. For instance, suppose a SID tune loads into pages $10-$1e, and also makes use of RAM at pages $90-$93. You want to relocate this tune so that it only affects memory pages in the $e0-$ff range. This is done in two steps: First, you specify -r 10-1e -p 81 in order to create an intermediate SID file where the code is located close to the data area. Then, you specify -r 81-93 -p e0 to move both code and data to the desired location.

Compressed code

The algorithm assumes that addresses are computed from raw program bytes, as they appear in the linked code. If code is read from a compressed stream of bits, there is no way we can add an offset to a set of program bytes and somehow end up with a compressed bit stream which happens to decrunch to relocated code.

Subtraction

Subtraction (sbc instruction) can be used in a couple of different ways:

  1. Subtract an offset from an address, producing an address.

  2. Subtract two addresses, producing a difference.

  3. Subtract two numbers, producing a difference.

At the time when we encounter the sbc instruction, we do not know which case to pick. The current implementation makes the assumption that case 2 never occurs (the choice among cases 1 and 3 can be postponed), since this behaviour seems to be good enough in practice.

Data table overflow

Most SID tunes contain a table that maps note numbers to frequencies, for somewhere around 96 notes (8 octaves). Occasionally the playroutine ends up using a note number which is too high. This might be used for sound effects where the pitch doesn't really matter, or during periods when the voice is playing at zero volume anyway. However, if an address is stored after the table, the MSB of this address might get picked up and written to a SID register. When the code is relocated, the pitch changes. For this reason, sidreloc will verify a relocated tune even if some pitches are wrong, but the amount of bad pitches will be reported.

A similar problem concerns pulse widths. Some playroutines store pulse width information in an instrument data structure, and if an invalid instrument is used, a pointer MSB might get picked up and used as a pulse width (or pulse width delta). In most cases this doesn't affect the perceived quality of the music, because if the composer desired a specific pulse-width (e.g. a non-modulating square wave), then it would've been very improbable for him or her to accidentally achieve this with an invalid instrument number.

Often, pulse width is treated as an integrator, meaning that a delta value is added in each call to the playroutine. When relocation results in a different delta value, this will make all future pulse width values wrong. Hence, the reported amount of bad pulse width values is often very high, even though the music sounds just like the original.

No playroutine

Some SID tunes take over the entire C64, and never return from the initialisation routine. These tunes tend to use up all CPU cycles and nearly all RAM. It should be possible to apply the relocation algorithm here too, by extending the CPU emulation to also include the CIA timers and the raster position register ($d012), and sample the SID registers at some reasonable cycle-interval. However, that would be of limited practical use, because such extreme SID tunes would be very hard to combine with other code anyway.

Evaluation

A script was used to run sidreloc against all (non-BASIC) tunes in HVSC #56, with all printouts redirected to a log file. Other scripts were then used to compile statistics from the log file.

Hints

If you're trying to relocate a SID tune, and the tool fails with the default settings, here are some things you could try:

  • If the verification fails with a lot of bad pitches or pulse widths, try the -f option to force sidreloc to output the file anyway. In many cases there is no audible difference.

  • If you get warnings about memory accesses outside the relocation range, but very close to it, then you should extend the relocation range using the -r option.

  • Try the -k option if you can live without zero-page relocation. This reduces the number of equations, potentially removing the one which was causing trouble.

  • If -k works, but you need to relocate some of the zero-page accesses, then you might get away with adjusting the range of free zero-page addresses (using the -z option) so that some of the addresses end up at their original location, while others are moved. In particular, avoid relocating zero-page addresses which could also be interpreted as single-byte opcodes (e.g. $a8 and $c8).

Posted Monday 28-May-2012 08:13

Discuss this page

Disclaimer: I am not responsible for what people (other than myself) write in the forums. Please report any abuse, such as insults, slander, spam and illegal material, and I will take appropriate actions. Don't feed the trolls.

Jag tar inget ansvar för det som skrivs i forumet, förutom mina egna inlägg. Vänligen rapportera alla inlägg som bryter mot reglerna, så ska jag se vad jag kan göra. Som regelbrott räknas till exempel förolämpningar, förtal, spam och olagligt material. Mata inte trålarna.

Anonymous
Mon 28-May-2012 14:42
Pretty cool. It's amazing how much work has gone into making this work the way it does. You're a great tinkerer!
Anonymous
Mon 4-Jun-2012 17:14
Wow, splendid work LFT!

However, I'm trying to relocate a song from $1000 to $c000. It's only like 9 pages but the tool bugs out on me:

sidreloc -p c0 -v tune_at_1000.sid tune_at_c000.sid
blablabla, $1000-$190b, 1 subtunes
Relocating from $1000-$59ff to $c000-$09ff
sidreloc: Neither the source nor the destination relocation range may overlap with the zero-page.

In the source I spotted the snippet below, but I don't get why you always add 64 pages. Some kind of safety zone?

for(i = 0; i < 64 && reloc_end != 0xcfff && reloc_end != 0xffff; i++) {
reloc_end += 0x100;
}


Regards
P-a Bäckström
Anonymous
Mon 4-Jun-2012 17:17
I should add that it works as intended if I specify the range with the -r option.'

More regards
P-a Bäckström
Anonymous
Tue 5-Jun-2012 20:32
Fantastic tool!

The extended end address poses a few problems though. The extended area isn't allowed to overlap zp so relocating to high addresses (under kernal) usually fails.

Note also that your particular example of Zoids is already relocated in HVSC (as are probably many others). The original player is at $c500.

/tlr
Anonymous
Tue 5-Jun-2012 20:34
...and I should add that my first point was exactly what P-A reported before me. :)

/tlr
Anonymous
Fri 26-Oct-2012 23:03
Sometimes I wonder what is your timezone. I mean, how it is possible to combine all the daily duties and so much work on such "trivial" things (no offence here). Well, anyhow, it's brilliant.
Anonymous
Wed 8-May-2013 18:40
Rather than parsing the replay routine and doing all this analysis, would it not have been cheaper to just extract the data, and write one's own replay routine?

I mean, most of the $1000-$1003 tunes were made in a limited set of music programs, were they not? And $1800 tunes, there is only one culprit, Future Composer, whose format should be well understood, well enough to construct one's own relocatable replay routine (if using Future Composer's own relocation capabilities is not desired). That should cover 90% of tunes out there, should it not?
Anonymous
Wed 8-May-2013 18:44
What C=64 code really needs is support for executable and linking format ("ELF"). Writing ELF headers from Turbo Assembler is no problem, but something should be able to read and parse the global offset table ("GOT"), which is exactly what is needed here for all these SID tunes (not the data itself, of course, but for a truly relocatable replay routine).

So maybe a little "shim" routine which would act as the runtime linker for dynamic objects, ld.so.1?

Hmmm, that might be a cool next project for me to implement, hmmmmm...