Answers to your coding questions…

A site about answers….not questions

Obviously, a sprite is a small creature, normally winged, that tends to flit and flutter about the place and do generally spritey things.

What does that have to do with computer graphics? Not much really; however, sprites also happen to be the name given to a certain hardware feature often exploited for game development.

There are other terms for sprites but we are not going to get into a history lessen It is enough to know that anything that tends to move or animate independently of the main background image is referred as a sprite.

If you have hardware which can render these little features then your job as a computer game programmer is not only easier but significantly more efficient.

As it happens, the DS has one of the better sets of sprite rendering hardware available for a 2D console. The answers you will find in this section are all about how to convince the DS hardware to display and animate our beloved game characters.

Posted by dovoto


In yesterday’s chapter we talked at length about how to compose an image on screen by manipulating each pixel until our scene was formed. While this gave us a great deal of control over the final result we quickly realized the DS is not quite a software rendering powerhouse.

To compensate for a relatively slow processor and an even more limiting amount of VRAM the DS includes several hardware features, the result of which is the most advanced dedicated 2D processing systems ever placed in a video game console.

Tile based rendering is the key component of this 2D technology and understanding it will allow you to squeeze enormous, detailed, and fully interactive worlds from the seemingly limited DS resources.

Tile Modes

What are tile-based graphics? Put simply, it means to describe your scene using a mapping of tile indexes to tile graphics. Instead of describing the screen as a 2D matrix of pixels we are going to describe it as a matrix of tiles where each tile represents a small bitmap. Let us look at one of the better-known tile based games and get a feel for how it was put together.

Below is all the graphics used to construct the entire overworld of the original Zelda.

You may recognize these little 16×16 chunks as pieces of the Zelda world and perhaps you could imagine that in order to describe the look of the overworld all one would have to do is store which tile goes where. For instance the following familiar scene could be represented by an array of tile numbers.

Such as this:

To render the scene all you would have to do is loop through the indexes stored in the map array and use those values to blit each tile to the screen. The entire world map would only end up being a few KBs in size instead of the 10s of Megabytes it would take to store it as a big image. The trade off of course is the increase in the amount of time it takes to render because we have to do a conversion between this map and the final bitmap we want on the screen.

Fortunately (and hopefully obviously at this point) the NDS 2D hardware is built for just this purpose making rendering tile based worlds a snap. All we really need to do is create a map and a tile set, place them into 2D video memory and tell the DS where to find them and it will do the magic for us.

In order to meet this first goal of placing tiles and maps into memory we must know where in memory to place them and in what format the NDS expects this data. This brings us back to the seemingly ever-present task of video memory management and memory layout.

Maps and tiles are laid out in a rather strait forward way. Tiles are simply stored sequentially in video memory. Maps which are 32 tiles wide are stored as a simple linear array allowing a direct copy from your map data to map video memory.

It turns out you can place maps anywhere in the first 64KB of background memory and you can place tiles anywhere in the first 256KB. You can set the location of your map by setting the “map base” in the bgInit (or directly in the background control register if that is your thing). Each base is 2KB apart. Since all backgrounds share video memory you have to make an effort to ensure they dont overlap.

Similar to maps, tiles also have an offset you can set with “character base” in bgInit. Each base is 16KB for tiles. Often you will put your maps at the beginning of video memory and your tiles after because the maps can only be stored in the first 64k of video memory.

To load a map into memory you pick a block offset and write the map there, then tell the DS were to find it. You then do the same for your tile graphics. Finally you load a palette and call it a day (there may be a few more details).

One thing we need to figure out is the format of tile and map data. It turns out this is rather straight forward.

Map Entries

There are two forms of maps. Ones with 8 bit entries and ones with 16-bit. The 8-bit flavor are simply an offset into character memory. For instance if you want the third entry in your map to use tile number 4 you just stick a 4 in that maps entry.

8-bit indexed maps are used only for “Rotation” backgrounds. “Text” and “Extended Rotation” use the more flexible 16 bit indices.

16-bit indexes are broken up into character index and control bits. The low 10 bits represent the index of the character and allow you to address up to 1024 unique characters. The next two bits will cause the character to flip vertically or horizontally. Finally there are 4 bits which let you choose a palette.

0-9The tile index
10Horizontal Flip
11Vertical Flip
12-15The palette index

To create a map you fill an array of short ints with these character indexes and you can control not only which character is rendered to the screen but what palette it uses and if it is flipped. If you ignore the flip bits and the palette bits you can treat it as a simple character index (you are still limited to 1024 tiles though).

I think at this point we know just enough about maps to get into trouble so let us see if we can trick the DS into displaying one.

First Map Demo

In this demo we are going to attempt to reproduce the screen depicted above, the starting point for one of my favorite games of all time. To keep things simple we are not going to use any external graphics files or map editors but we are going to code the map and the tile graphics by hand. Our goal is not a perfect reproduction mind you…only to get the idea. Lets take a look at the entire source code:

What do we get when we run it?

Lets take a look at the code. Notice the first thing we do is we define our map. This map was hand coded in that I decided how to place the tiles and edited the map array directly. Because this is a bit tedious I chose to only use 3 tiles. A tan one for the ground, a green one for the rocks, and a black one for the stairwell.

Next we have an array for our tiles. 3 8×8 tiles are defined each of a solid color. Normally we will use a graphics program to draw our tiles and a map editor to build our maps but today is a day for simplicity. Notice each pixel in our tiles is simply a number which is a color index. In order to make it the color we want we will need to set the colors in the palette.

Now that our map and tile data is defined lets take a closer look at main starting with the video init code.

Notice we set mode 0 which gives us 4 regular backgrounds layers to play with. We also give our background engine some video memory to utilize else it would have no room for our map or tiles.

Next we init a background layer. We chose layer 0, and set its type to 8 bit per pixel (256-color paletted) and chose the smallest background of 256×256 pixels (32×32 tiles).

We manually set some colors for our tiles to use. Recall that the black tile used color index 0 so we set the palette index 0 to a Red Green Blue of (0,0,0). The other two colors follow (I used a paint program color picker to figure out what combination of red green blue made tan and the tint of green I needed).

Copying in the tiles is pretty strait forward. Just use the hardware memory copy function of the ds to move our tile array to the background’s tile graphics memory.

Finally, the only complicated bit of code in the demo. Had I the patience to hand make a 32×32 tile map i could have just done a strait copy from my map to the hardware map…but I was lazy. The original Zelda tiles are 16×16 which makes their map about 1/2 the size as we need. For every tile of my map I copy in two tiles into the hardware map. This is done simply by dividing the x and y index on my map by two and remembering that the indexes are integers (any thing left over from a division of integers gets truncated).

Hopefully that wasn’t too complicated. In the next few demos we will use graphics editor to draw our tiles and a map editor to build our maps and take some of the tedium out of the process.

Posted by dovoto

[edit] What is a register

The first concept we must get under our belts (and the only one that really matters at the moment) is the concept of memory mapped registers.

Now, I am sure you are aware that the DS has several different chips inside responsible for creating the images and sounds that accompany most games. There is sound hardware responsible for producing annoying chip tunes, the video hardware which puts all your convoluted data together in a nice and pretty display, the memory chips that hold the data for our programs, and the CPUs which are in overall control of the whole shebang (in all honestly many of these “chips” are actually just parts of one large integrated circuit and not really separate chips).

When you are writing c code to describe the events in your game you are directly controlling the DS processors. But, the CPUs do not work alone and generally we like to have some control over what the rest of the system is doing. The method by which this is accomplished is the use of hardware registers.

Beginning at the memory address of 0x4000000 and running for quite some many bytes is the memory mapped register space. What this means is that if I write some arbitrary value to the address 0x4000000 then I will be writing to a register that will have some effect on how the system renders (or fails to render) my video game. Understanding registers is vital in understanding console development.

Using memory mapped registers requires some knowledge I hope you already possess (if you know any c at all) and that is: how to write data to any specific address.

Hopefully you remember the concept of a pointer but, if not, that’s okay because I will cover it briefly. Recall that a pointer is a type of variable that holds not data but, instead, the address of some type of data. In this example, let us say we know (which we do) that the register at 0x4000000 was 32 bits in length and controls the display and we hence call it DISPLAY_CR; we might use code like the following to write to this address:

Now, this would work just fine but there are some issues with this code. First, because these are hardware registers it is possible (even likely) that the values stored at these addresses will be changed by the hardware directly. This is something the compiler needs to know else it will try to optimize our code and we would miss these changes. The way we tell the compiler that variables change outside of the c code is to declare them volatile.

The other issue is we are using a variable (RAM) to store a constant. We would be much better served if we just used a #define…this also allows us to dereference the register in its declaration and makes writing to it a bit simpler. Here is the new, more proper code.

Notice there is no longer any need to dereference the register prior to use as it is implicit in the definition. Also, this code uses no space in memory for the pointer as it is just a constant (of course the compiler being as smart as it is even had you used a variable it likely would have been smart enough to optimize and the result would have been the same).

I hope this concept is clear to you; about 30% of the following pages are nothing but descriptions and examples of how to use the hardware registers to control the many features of the DS

[edit] Twiddling Bits

It will rapidly become apparent that controlling hardware via registers will require an understanding of how to target specific bits inside the register. That is, we must be able to set or clear some bits in a register while leaving the rest untouched. Even though this is rather simple and talked about in many other places, it is important enough that we must waste the time of the 90% of readers who already know it in order to ensure the 10% who have no clue are not left in the dust.

[edit] Numbering Systems

Most of you can begin at 0 and count all the way to 9 (a feat by anyone’s reckoning). If so, you probably realize there are in fact 10 unique digits you will come across in this endeavor. Oddly enough, the numbering system we use on a daily basis is called base 10 (in academic elitist societies you may also hear it referred to as Decimal).

First let us review an interesting detail about base 10 that you already know: the significance of the placement of the digits in any number. If the digit is on the right-hand side of a number it is generally weighted less than those digits appearing on the left to such a degree one might call it exponential. For instance consider the following examples of decimal numbers:

This ringing any bells? You realize quite readily that the weight of the digit in question is equal to 10 to the power of the place of that digit in the number.

Unfortunately for us, computers do not use the same numbering system we do. The reason for this is a simple one. They only know 2 digits. Why is this unfortunate for us? It turns out that 90 percent of the time we can ignore the fact that computers use Base 2 because the compilers and tools we use automatically convert our base 10 numbers to binary for us. But, sometimes an understanding of the computer numbering system is crucial to coding.

The binary system works much in the same way as does our own decimal system with the exception that it has fewer digits and instead of weighting the value of a digit by 10 to the power of the place we instead weight it by 2 to the power of the place. For instance here are the same numbers as above in base 2 and their decimal equivalents.

You might notice writing the value 8 in binary requires 4 digits, this may not seem an issue off hand but as numbers increase writing binary rapidly becomes cumbersome. A string of ones and zeros is prone to error and very difficult to read. To combat this, a new base was developed making manipulation of numbers on computers easier to handle. That numbering system is base 16; often referred to as Hexadecimal or Hex.

Hex numbering uses the digits 0 – 9 and A – F and you end up with numbers which look like the following:

At first you may be wondering why the hell you would ever go through such seeming pain to write numbers in such a way. Well it turns out the conversion between Hex and Binary is very simple. Because there are 16 digits in hex (a power of 2 mind you) you can represent each hex digit with 4 binary numbers. All you need do is become familiar with counting in binary from 0 to 15 to convert back and forth.

A few examples might be in order:

Take the binary number 1101001001010101010101010001 To convert this number to decimal would require you to look at each bit and add 2 to the power of the place if there is a one in that position. This number in decimal becomes:

To do the same conversion to Hex you break the number into 4 bit nibbles and convert so it becomes:

Now that you believe that hex to binary might be simpler than binary to decimal you may still be a bit unclear on why we don’t just write everything in decimal and let the compiler figure it out (because it will). As we said before computers are binary in nature and as such hardware is controlled by specific bits at specific addresses. Because we need to set specific bits in the binary number we must at some point think of the number in binary. This will become more apparent as we actually set those bits.

Another good reason is bit boundaries play a significant role in memory addressing on most systems. For instance DS video memory for one of the graphics units begins on a boundry. The address of this memory can be written in hex as 6000000, that same address in decimal is 100,663,296. Although you could store the value as a pointer and write to video memory using either numbering system the hex value is much easier to remember and much easier on the eyes.

As a final note in C hexadecimal numbers are denoted with an 0x at the beginning of the number.

Notice case is not an issue.

[edit] Bitwise Operations

Being able to ‘and’ and ‘or’ bits together is important when attempting to enable certain features of hardware. Below is a summation of how bitwise operators work in C and how you might use them.

AND operator: ‘&’

Different than the logical and ‘&&’ the single ampersand denotes two operands should be ‘anded’ together. Each number will be compared against the other bit by bit and if either number has a 0 in that bit position a 0 will be stored in the result.

AND is useful for checking the status of bits. For instance to see if the first bit of a register is set simply AND the register with the value 1. The result can only be 1 if the first bit of the register is also 1.

It is also good for clearing bits. If you want the lower 8 bits of a number cleared to 0 simply AND the value with a number that has all the other bits set to 1. This is where hex comes in very handy.

OR operator: ‘|’

A bit by bit comparison which causes the result bits to be set if either of the bits in the arguments are set.

OR is good for setting bits. To set a bit, simply OR the register with a value that has that bit and that bit only set. Here is an example of setting bit 9 of a 16 bit number (bits are numbered right to left begining at 0)

XOR Operator: ‘^’

XOR is great for flipping between states. In the above example, if an XOR was used in the place of an OR then the bit would be set if it was clear and it would be cleared if it were set. This is very useful anytime you need certain bits to alternate states every frame.

NOT Operator: ‘~’

NOT inverts the bits in a number rendering all 1s to 0s and all 0s to 1s. This is very useful for clearing bits. For instance if you know the main graphics engine will render to the top LCD when Bit 15 of the power control register is set to 1 and on the bottom when set to 0. We can ‘or’ the power control register with bit 15 to set it and we can ‘and’ the register with bit 15 inverted to clear it. Here is a snippet from libnds.

In this case POWER_CR is defined as a pointer to the power control register and POWER_SWAP_LCDS is defined as bit 15.

The final bitwise operations to talk about are the shift operations ‘>>’ and ‘<<’. When used these operators cause the binary number to be shifted to the left or right the specified number of places:

If you will recall the weight of a digit is proportional to the base raised to the power of its position in the number. When we shift numbers to the right ‘>>’ we are reducing the weight of the digits effectively dividing the number by 2^n where n is the amount we shifted. Similarly a left shift ‘<<’ will multiply by a power of 2.

Often it is beneficial to use shift operators when division and multiplication are required as they execute more quickly. Do not get carried away though as they are less readable and the compiler will convert multiplications to shifts when ever possible for you.

The shift operator has other uses and plays a big role in fixed point arithmetic which we will cover shortly.

[edit] Talking to the keypad

It is difficult to do any interesting yet simple demo programs without understanding how to read user input. Fortunately for us, getting the state of the DS keys is exceedingly simple (if you understood the above discussion that is).

The state of each button is stored as a bit in memory mapped register space. To know if a key is pressed or released we just read the state of a specific bit. All we need to process the keys is the knowledge of where these values are stored.

Let us write our first real demo that checks for key presses and prints their state on the screen. Before we get to the code let us look at the main register used for key state on the DS.

(insert key pad register description here).

One thing you might note is the glaring absence of the X and Y keys. A bit further down we will demonstrate a more refined approach to handling input and introduce the functionality built into libnds and see if we can’t find those missing buttons. For now let us get our first real demo out of the way.

Much of this code is as you have seen before. We initialize the print console so printf prints to the sub screen using default settings. The next two lines initialize the libnds interrupt handler and enable the vblank interrupt. This is necessary for something we do in the main loop but it is a bit out of scope for this first day. We will talk at much greater length about interrupts (IRQs) on a later day.

The main loop just checks the KEY_A bit of the input register. This happens to be bit 0 and when the key is pressed that bit will be clear. This is all there is to checking for key presses on the DS.

The next two lines of code force the DS to wait until the screen is done drawing and then clears the screen. This prevents some nasty looking text flickering. Again, understanding this bit of code requires some knowledge of interrupts which will have to wait.


Now that simple reading of the key presses has been covered it is time to consider a bit more advanced needs…such as what about those X and Y buttons?

Unfortunately the designers of the DS were a bit lazy and stole the GBA input hardware; it seems our wonderful GBA input was a bit lacking in the number of buttons available to the user. What this resulted in is that we can read the A, B, Up, Down, Left, Right, Start, Select, and the Left and Right shoulder buttons from one place but the X, and Y buttons are in a different register…and can’t even be read at all by the main CPU!. It seems in our very first foray into DS programming we must face the complexities of a dual processor system.

The solution is to read the keypad from the ARM 7 (which can read the X and Y buttons plus the hinge “button” on the DS lid and the pen state for the touch pad) and put the results someplace readable by the ARM 9.

If you are like me then this code seems a bit awkward. It would be nice if we had some way of wrapping all these bits into a single location to simplify the reading of key presses. Libnds provides just such a wrapper. Let us see how we would do the same code using the libnds wrapper then take a closer look at what the wrapper is doing.

Two things to note in this new demo is the use of a scanKeys() call every frame and the change to positive logic: Now the bits are set if the key is pressed and clear if they are released. Along with keysHeld() is a keysDown() which will only be true if the key was pressed since the last time you checked (ie it will return true once but unless the player releases the key and presses it again it will return false).

Some other useful functions are keysUp() which returns the released keys and keysDownRepeate which returns true after a certain delay (measured in number of scanKey() calls) even if the keys have been held down. Check the documentation for input.h for more information on how to use these other functions.

Basically the way scanKeys works is to combine the bits from REG_KEYINPUT and IPC->buttons and apply a little bit of state tracking to determine which have been pressed since the last call. Here is an excerpt from the key handling code in libnds:

The statement at the beginning does most of the work by negating the register state and masking out / recombining the two sources of key state. If you have not noticed by now you will need to have a decent understanding of bit operations to work with register controlled hardware.

[edit] Frame buffer…finally

It is nice to finally get to graphics programming…I don’t know about you but two full days of fluff is about all I can take.

If you have actually followed along with the subjects and code presented so far you will find doing frame buffer graphics on the DS is surprisingly simple. All we need do is put the DS into frame buffer mode and begin writing images to the screen.

If you recall from yesterday’s topic the DS supports many graphics modes with the main screen supporting a simple frame buffer. It is this mode we will turn to first as it is very easy to set up and even easier to use.

I figured I would start this chapter with some code and use that to explain the frame buffer mode.

This code is very similar to the code for the day 1 introduction demo. As before we enable interrupts and turn on the vblank interrupt. Next we set the video mode to a frame buffer mode which uses the first video ram bank. We then set the first VRAM bank to be writable by the CPU and to act as a buffer for the LCD (recall the first VRAM bank is VRAM_A).

The main loop creates a color as a 16 bit unsigned short integer and sets its value to red. If A or X are pressed then the color is altered to be green or blue respectively. Finally we wait for the screen draw to finish and fill VRAM_A with the selected color.

Now that we have a base understanding of the code we need to get to the details, namely:

  • How do videoSetMode and vramSetBankx work?
  • How does the DS treat color?
  • What the hell is a frame buffer and how do I write pixels to it?

These are the questions we will now explore.

[edit] Display Control

[edit] VRAM Control

[edit] DS Color Formats

The DS has several ways in which it represents color. These ways generally fall into two categories: Paletted and Direct.

Direct color means the value directly controls the intensity of red, green, and blue that is fed to the pixel. There are technically two direct color formats used by the DS but you will see the variance between the two is minimal.

Direct color uses 5 bits to represent how bright each color component can be (red, green, and blue). We refer to this format as 555 or sometimes 15 bit color. If you recall from our discussion on binary numbers, 5 bits amounts to 32 levels of intensity for each of the three colors.

To describe a color in this format we need to combine our desired values of red, green, and blue to form one 15-bit number. The color components are stored as follows:


This can be translated as the least significant 5 bits hold the red component, the next 5 bits hold the green and the remaining 5 hold the blue. We refer to this as BGR format. This would be a good time to flex our bit twiddling muscles and see if we cant define some colors.

To do this we specify an intensity for each of the 3 components and then shift the values into the correct place, finally we OR them all together to get our 15 bit value.

Simple enough? Since we don’t normally want to concern ourselves with this detail we use the macro provided by libnds (or write our own) which is depicted below:

Taking a short break from theory you should now glance up to the demo we just wrote. You will notice I used this macro to paint the screen red. It should be apparent at this point that the frame buffer utilizes 15 bit Direct color format.

The other Direct color format is nearly identical to the one just discussed. The only difference is in the most significant bit (depicted as the ‘x’ in xBBBBBGGGGGRRRRR above). When utilizing this format this bit is known as the ‘alpha’ bit and when set to 0 will prevent the color from appearing onscreen. Most 16 bit graphics operations on the DS utilize this "alpha" bit to determine transperency of the rendered pixel.

When we move on to Direct color bitmap modes you will quickly discover not setting this alpha bit will result in nothing on screen. Recall to set this bit requires some more bit operations:

Although direct color formats provide a wide range of colors they have a serious drawback: They take up 16 bits for each pixel. Although the DS has an abundant amount of memory and CPU power compared to early 2D systems it still pales in comparison to most modern machines. You will be surprised how rapidly you will fill memory with a 16 bit image or how much stress you will place on the hardware if you attempt to blit large 16 bit images.

To alleviate this the DS utilizes many space saving tricks. The most prevalent is the use of paletted colors. Instead of specifying color components directly we instead build a table of colors and specify an index into this table.

Let us say we have a 256 color table (we will refer to this table as the “palette”) which contains 256 direct color values. We can then set pixels onscreen to these values by specifying an index. Because the table is small we would only need an 8 bit index to describe the pixel color…a savings of 50%!.

The DS supports 8 bit indexed palettes as well as 4 bit and we will figure out the mechanics of their use as we proceed.

[edit] Frame buffer 101

A frame buffer can be described as a direct map of memory values to onscreen colors. By simply writing the correct color value into memory we can set a pixel as we see fit.

Framebuffer memory can be completely described by three things: The address at which it begins, the color format of the pixels, and the number of pixels per horizontal line.

The memory for a frame buffer is a single linear map such that the first W entries correspond to the top row of pixels on the screen (W in this case is the “width” of the buffer). The next row of pixels follows and occupies entries W to 2*W – 1.

Image:Framebuffer_linear.png (image of linear memory) Image:Framebuffer_2D.png (image of memory as it represents 2D space)

[edit] Pixels and things

To accurately place pixels onscreen we must have some idea how to specify location. This is normally done using a modified Cartesian coordinate system where we specify how many pixels from the left and how many pixels from the top we wish our value to be placed.

Image:Coordinate sys.png

The distance from the left hand side of the screen is usually referred to as the X coordinate of the pixel and the distance from the top is the Y coordinate. (Those of you who are math whizzes might see the disparity between Cartesian coordinates as Y usually is measured from the bottom up…live with it).

To figure out the offset into framebuffer memory we need to perform a simple calculation based on the X and Y coordinates we wish to affect. Because memory is arranged linearly in the buffer to get to the correct horizontal line we simply multiply the number of pixels on a line by the value of the Y coordinate. We then add the value of X and we have our offset.

Image:Pixel offseting

Let us translate this new knowledge of pixel plotting and color formats and see if we can produce an interesting (sort of) demo.

For our first pixel demo we will do a starfield with little floating dots. Each dot will make its way across the screen at a random speed. When it reaches the end we will move it back to the beginning and give it a new random height and new random speed. This should give us a nice star-trekie feeling demo of a moving star field.

Here is the source in its entirety which we will pick apart below; you can cut and paste this code into your main.c (or template.c if it is so named) from our first few demos. You can then build and run the demo.

We begin with a structure to define our star. It needs a location in the form of an X and Y coordinate, it needs speed, and finally it needs color.

We then need an array of stars we can track across the screen:

Before we start the demo, we need to clear the pixels of any color information they currently have. In other words, we are making sure we start with a black screen.

To start the demo off it would be nice if we could arrange our stars randomly about the screen. We do this with an initialize function.

This function loops through all stars and sets the color to white, the speed to a random value between 1 and 4 and the X and Y to some random location on screen. If the ‘%’ is unfamiliar to you I will give a brief explanation.

‘%’ performs a division and returns the remainder of that division.

Rand() returns a random short integer so moding the value with a number will result in a value which is between zero and that number. To generate a random number in any range between MIN and MAX is simply:

Next we need some function to move, draw, and erase the star. Let us begin with erase.

To erase just set the location of the star in the framebuffer to the background color (black in our case).

Similarly to draw the star we set the location of the star in the frame buffer to the color of the star:

The final step is to move the star to its new location:

Moving a star is simple, we just add its speed to its current x position. The caviate is we must then check if the star has gone off screen. To do that we compare its x location to the width of the screen. If it is greater we know we are off the screen and we can take appropriate action.

When a sprite goes off screen we move it back to the left by settings its X value to 0. We then give it another random speed and random Y value making it look like a new star has come on screen.

The main loop which controls the demo consists of looping through each star and first erasing it from its old position, then moving it to its new position, and finally redrawing it at its new location:

And so ends our pixel plotting demo. Below are a few more demos explained in the same excruciating manner as above.

Color Bar Demo

[edit] Touching things

The touch pad is an amazing addition to a handheld video game system that not only makes for an interesting gameplay experience but so too does it add a new level of fun to game programming.

This section will introduce the touch pad, show you how it works, and give a quick demo on its use.

The DS touchpad utilizes a resistive coating which changes conduction depending on the area of the contacting object. This change is measured by some analogue to digital converts on a special chip inside the DS and translated to an X and Y location. These measurements can also be used to determine the area of the contact point which, to some degree, can be translated into pressure.

To get to this raw data we must communicate with this chip via a serial interface which is only accessible via the ARM 7. Currently I do not have the stomach to go into serial comms in this tutorial but if you have a mind to explore such things the source code is in the arm7 code base of libnds.

For now I am just going to do a bit of hand waving and tell you there is code running on the arm7 in the default arm7 template which reads out the state of the touch pad. This data is unformed and does not correlate exactly to the dimensions of the screen meaning some processing must be done to convert this raw location data to useful pixel data.

The libnds arm7 stub does the appropriate conversion and communicates the result to the ARM9. These values are then read using the touchRead function which writes the raw coordinates and the transformed pixel coordinates into the pointer parameter.

You can also get a go – nogo test of the pen by using the scanKeys() macro we discussed as above. This tells you if the pen is up or down so you know when to read the touch pad.

Now for a simple demo. We will make the simplest of art programs possible. It will render random colored dots wherever you touch the screen.

There is not too much to say about this demo…but of course I am going to say it anyway.

You should notice I skipped the interrupt setup for this demo. This is for two reasons….there is no real animation cycle and we only render one pixel at a time. This means even if we draw while the screen is rendering our pixels there wont be anything to tear.

Notice also the use of scanKeys(); we use the keysHeld() macro instead of the keysDown() because keysDown() would only return true the first time the pen touches while keysHeld() returns true until the pen is lifted up again. This allows us to draw our dots without lifting the pen.

When using the touchRead function, we need to pass in the pointer to the variable touch rather than the variable itself. Putting an ampersand, &, in front of a variable replaces it with the address to that data.

Color is selected at random using rand() –recall rand() returns a random 16 bit value which is convenient as color is also 16 bit.

When drawing rapidly you probably noticed big gaps between your dots as apposed to nice smooth curves. This is because the touch coordinates are only updated once per frame and you can move the pen a lot faster than that. We will make this demo a little prettier when we learn how to draw lines a few pages hence.

[edit] Bitmap Graphics Modes

I talked a bit before about the DS graphics modes and alluded to being able to compose a scene from layers of graphics. Normally all rendering is done to one of these layers and this section will be the first real use of the 2D engine. Before we talk about the specifics let us look again at the possible graphics modes and what each layer can do in these modes.

Graphics Modes

Main 2D Engine






Mode 0





Mode 1





Mode 2





Mode 3





Mode 4





Mode 5





Mode 6



Large Bitmap


Frame Buffer

Direct VRAM display as a bitmap

Sub 2D Engine






Mode 0





Mode 1





Mode 2





Mode 3





Mode 4





Mode 5





You will notice each engine has several modes of operation with each mode having different background layer configurations. The background configurations we are interested in this case are the ones marked “extended” graphics layer.

Extended rotation background layers can be configured as linear frame buffers, much like we have been using in the examples above.

When we talk tomorrow about tile based graphics we will cover in detail the capabilities of the “extended” backgrounds as well as the text and rotation backgrounds. For now we need only to understand a few things. First is how to put the display into a mode which supports an extended background, next is how to turn on the correct background, and finally we need to know how to initialize the background properly.

The first thing to remember when using the 2D engine is the lack of memory available. In fact, the DS has no memory assigned to its 2D units by default other than Sprite attributes and base palettes.

In order for us to do anything we must map video memory somewhere the 2D engine can find it. To do this we must know where the engine is going to expect memory to be and we need to know what video memory can be mapped to these regions. What follows is the layout of 2D graphics memory.


The areas we are concerned with now are the background graphics memories. To use background layers we must map video memory to at least one of these regions.

Let us start with a small example which paints the screen red and see how using a background layer differs from direct frame buffer access.

To better understand the memory layout for 2d backgrounds, we must first look at the image below. It is logically divided into 32 blocks of memory for graphics (also 32 blocks for map data but that is a topic for tomorrow). We will revisit this organization of graphics memory tomorrow but for today it is enough to know that the background will pull data starting at one of these blocks and which block it pulls from is controlled via the background control register.

Image: nds_2D_background_memory.png

First, we set the video mode. If you look back at the video mode table, you will realize mode 5 allows for background layers 2 and 3 to be extended rotation backgrounds. We chose background 3 for this demo although background 2 would have worked just as well.

Next we map vram bank A to main background memory. The vram table shows we could have mapped other vram banks to this region as well. Picking which vram bank to map takes a bit of planning, but for our simple demos it will not be too difficult.

Next, we initialize the background. bgInit follows the header from the nds/arm9/Background.h: bgInit(int layer, BgType type, BgSize size, int mapBase, int tileBase). In this demo, we wanted a bitmap mode and since we have not really discussed palettes yet we are going to stick with 16 bit color. We choose a size of 256×256 to fill the entire screen. Once this code is completed, BG_GFX will be set to the address of the initialized layer; in this case, it is layer 3.

The final step is to paint the screen red. Take a look at the following lines and see if you can note the difference between the straight framebuffer code.

Hopefully you noticed the setting of bit 15 of the color value. Remembering back to a previous talk about color formats on the DS you might recall there is a 16-bit color mode with the normal 5 bits of red green and blue along with one bit for alpha. 16-bit bitmap backgrounds use this color format.

The one bit of alpha tells the DS to render that pixel. If you leave this bit clear the pixel will not be drawn and anything behind it will show through. This is a very useful feature when you have two layers of background and want part of the top layer to be transparent.

Next we will look at paletted bitmaps through the somewhat practical exercise of decoding graphics files.

[edit] Bitmap on the sub display

For completeness sake here is an example which puts both the main and sub engines in mode 5 and renders to bitmap backgrounds. Notice the alternative register access using the background struct, instead of the bgInit function.

[edit] Background struct

libnds defines a struct for easy access of the background registers.

[edit] Working With Graphics Files

Being able to decode graphics files is a useful skill and although this is usually done on the PC we are going to do it directly on the DS just for kicks. There are a lot of different graphics files out there, and each file has advantages and disadvantages. For our purposes we need one that is easy to decode and is supported by many graphics applications. Some good choices would be: GIF, BMP and PCX.

I am going to tackle the BMP for this example. It is simple and supported by just about every graphics application on Earth.

To decode a file format you must first seek out its spec. A quick search on Google gives me the following information for bitmap files.

There is a short bitmap header followed by a variable length image header (this variable length header turns out to be 40 bytes in length almost always). Next comes the palette (if there is one) and finally the pixel data. Here is a bit more detail:

Bitmap File

Bitmap Header


Size in bytes




The characters “BM”



Filesize in bytes



Reserved (usually set to 0)



Offset to data

Image Header



Size of image header (normally 40)



width of the image



Height of the image



Number of planes (normally 1)



Color depth (bits per pixel)



The rest is not interesting and hardly ever used

Palette Data

The color palette stored as Red, Green, Blue bytes (with an extra byte of padding)

Graphics Data

The pixel data. This will either be indexes into the palette or raw Red, Green, Blue color data.

BMPs support a number of bitmaps types and although this example will assume 8 bit 256 color bitmaps it could very easily be extended for other color depths (an excellent exercise for the reader if you are of a mind for those sorts of endeavors).

For a first step let us write a short demo which looks at the header, checks if it is bitmap file by reading the signature, and prints out the bits per color, height, and width.

Image:Nds day3 bmp show.png Bitmap Header Decode

We are going to use a trick called overlay to read the header. Instead of parsing each byte in and figuring it out we are going to define a structure which is the same size as the header. We can then pretend the start of the bitmap is the start of this structure and access all the attributes like normal structure members. Let’s start with the bitmap header struct.

The idea is to look at the layout of the header and design a struct to match it. The first two characters are the signature so we add a character array of length 2 to line up with this. We do this for each element in the header.

Unfortunately for us the C language does not describe how exactly a compiler treats the memory assigned to a structure and often a compiler will pad a structure with empty bytes to make it a multiple of 32 bits in length. It does this because processing structs aligned so is generally more efficient. For us, we need the structures to be packed together so we can lay them on top of our bitmap data with no padding throwing us off. To ensure this is the case, we add the packed attribute to the structure definition…how you do this varies between compilers but for gcc this works like a charm.

Next we need a struct to hold the image header which we will just assume is 40 bytes, even though it could technically be variable.

Notice again the use of the packed attribute.

Finally we define a structure to hold the entire bitmap and image header.

A bitmap file just consists of the two headers back to back followed by palette data and finally the image. This is where forgetting the packed attribute would byte you in the ass as gcc would stick in a few bytes of padding in-between the structs and things would not align (go ahead…try it).

The palette colors are stored as blue, green, red bytes followed by one empty byte. Since we are only going to decode 256 color bitmaps we are going to hardcode this into our bitmap structure. If you want to extend this you will need to do a small amount of parsing and first read in the bits per pixel and the number of colors before you do anything with the palette or image data.

The final entry in the bitmap file might seem a bit odd as it is an array of length one. Since we don’t know how big the image array needs to be in advanced we give it a length of one, since it is an overlay we can just keep reading as far as we like.

Finally the code to decode the bitmap header:

This demo (if it were complete) would overlay our bitmap structure onto a bitmap file and print out the signature, bit depth and dimensions of the file. Hopefully, after all that discussion, there should only be one question remaining: How did I get a bitmap file into my DS application?

It turns out there are a lot of ways to get data into your application. You can use a file system and read it in from your compact flash or SD card. You can read it in from a wifi source like the internet or a file share. You can run it through a converter and output the data as a big c array and compile it in or you can use object copy and create an object file you can link in.

The simplest way (thanks to wintermutes wonderful make file) is the object copy method. To include data in our project we just create a folder called “data” in the project folder (right next to source and include) and drop in a file. If we add a “.bin” to the end of the file name the make file will pick it up, run it through object copy, and create a header file with some ease of use variables declared.

For instance in the above demo I dropped in a bmp file called beerguy.bmp into the data folder. I then renamed it to beerguy.bin and typed make. A header file was created called beerguy_bin.h which contained the following:

To access the data I just include the header file. I recommend you use this method as it works on any media and is simple. If you need file access because the 4MB limit is too much then the built in fatlib included with devkit pro is a great option.

I think we are finally ready to display this bitmap. All we have to do is copy the palette to the backgrounds palette memory and copy the image data to the backgrounds bitmap memory. Each of these steps have a small quirk.

The palette entries from the bitmap file are in 8 bits per color and we need 5. To fix this we need to chop off the lower 3 bits of each element. If you remember our conversation on bit operations this is a simple matter of shifting the number to the right by 3. Copying in the palette is done as follows:

The quirk for the image data is that the image is (for some strange reason) stored upside down. We have to flip the image by reading the bottom of the bitmap into the top of the video buffer.

We do this by starting at line height minus 1 and subtracting the y index. Because video memory only accepts 16 or 32 bit writes we copy two bytes at a time (this is why you see the width divided by 2 in several locations and why video memory and bitmap image memory are declared as pointers to short instead of char).

Demo Source

Image:Nds day3 bmp decode.png Displaying the Bitmap

[edit] The Double Buffer

Double buffering is a common technique used to address a common problem with raster graphics. Usually you do not want the user to see what you are rendering until you are done rendering it. To ensure this happens you have one of two options.

First, you can only render during the time the display is not rendering—vblank and hblank. This method works well and in fact is an often used mode on the DS as the hardware does much of the rendering work for us. But, often you just need more time to get your scene in order.

This brings us to method number two: The double buffer. For this we simply render to an off screen buffer. When we are done we make the off screen buffer visible and whatever was visible before becomes our new off screen buffer. This gives us as much time as we need to compose the scene with the slight drawback of needing a copy of visible video memory. Fortunately the DS is more than roomy enough to accommodate.

There are actually several ways to go about implementing a double buffer but the easiest is to use a single background’s video memory and allocate one half to the visible buffer and one half to the non visible buffer (the “back buffer”).

To demonstrate the use of double buffers let us extend our bitmap demo and animate a series of 10 frames. A friend was kind enough to render me up some full screen graphics and for this first go we will do it without a double buffer.

Compile and run this demo and you will see a bit of tearing and a lot of ugliness. The reason for this is my bitmap decode function is very slow and cannot complete in the vblank time frame.

The only thing different between this demo and the one before is the inclusion of 10 bitmap files which we cycle through once per vblank.

Let us use a double buffer and fix the problem.

In this demo we set up much as before only this time we set the starting point of background graphics to base 3. If you recall each base represents 16 KB of memory and if you are quick with math you will notice that a 256×192 image takes up exactly 3 blocks of memory. This means we set aside the first three blocks as visible and the next three as our render buffer.

We create a Boolean value which tracks which buffer is visible and alternate each frame. To swap the buffers we simply tell background 3 to render from one base and set the back buffer to the other base. This ensures we are always rendering to the off screen buffer and displaying the on screen buffer.

The only other difference to note might seem a bit odd. It turns out that all my bitmaps use a separate palette so I must double buffer the palette as well! For this I use an in memory buffer.

The decode bitmap copies the palette to my local palette array (old_palette), and when I swap the visible buffer I also swap in this palette so the right palette is loaded with the right buffer.

That is about all I am going to say on double buffers. They are useful and fairly simple to implement and can greatly enhance the look and feel of your program.

[edit] Raster 101

Raster graphics are the means by which most early games were rendered. It simply means to draw to a display on a per-pixel basis. We are going to let the DS hardware do most of our rendering for us, but there is something to be said for doing things the hard way every once in a while. We will cover line, circle, and polygon raster graphics and throw in a bit of optimization discussion along the way.

[edit] Bresenham Lines

We will begin our raster discussion with line drawing. The deceptively simple task of connecting two points on a 2D display by a series of pixels has been the subject of much research and countless papers. Perhaps we should start by defining a line.

In the majority of mathematical realms a line is an infinite, straight, one dimensional projection through space. To define a line all you need is the location of one point on that line and some indication of its direction.

Because one dimensionality is tough to achieve on a computer display and an infinite line might take too long to render we will have to restrict this definition a bit. For us all lines will lie on the plane of the screen, the length will definitely be finite, and that one dimensional thing will be very loosely applied. Let us take a close look at what a line on a computer display looks like to get a feel for what we need to accomplish.

<image of line here>

As you can see the line can only move in discrete steps of pixels. To render a line we just iterate through one dimension and plug the other into the equation for a line. Let us remember way back to geometry class and recal that a line can be defined as follows:

Y = mX + b;

Where Y is the axis we are trying to calculate, X is the axis we are iterating through, b is the value of Y when the line crosses the X axis and m is the slope of the line defined as change in Y divided by change in X (rise over run). I don’t know about you but when I want to render a line I plan on just picking the two ends points and the color and having the algorithm do the rest. We can certainly calculate these values from two points but there is a slightly less common equation for a line that will be easier to work with:

Y – y = m (X – x)

In this case X Y and m are as before but x and y are the coordinates of any point on the line. As you might notice we don’t have to worry about the intercept (previously ‘b’) in this form which simplifies things a bit.

So let us see if we can translate this equation into a line on the screen. First we will need to calculate the slope. Let us define our function as DrawLine(x1, y1, x2, y2, color) where the x’s and y’s are coordinates of the two points we wish to connect.

One thing you should note right away is that slope will most likely be fractional in nature requiring us to use floating point math. You may also remember the DS has no floating point hardware and if you are particularly astute you will very shortly realize all this discussion is going to lead to some more interesting way of drawing a line.

Some pseudo code for drawing the line using this point slope equation would be as follows:

I don’t recommend trying to compile this as there are several flaws in the algorithm. First we are kind of assuming the line changes more in x than it does in y, if that is not the case we are going to be jumping more than 1 pixel in y each time we iterate through the loop and leave large gaps in the line. Second we are assuming that x1 is smaller than x2 when really we could have specified any two points for the function. You could of course modify the above and use this method to draw lines on the DS but it is certainly not the best way to go about.

This brings us to a more realistic way of rendering lines. It involves only integer math, no division, and no multiplication.

Bressenham lines:

Let us again look at how a line ends up looking on screen and see if we cant find a way to iterate through X and change Y accordingly without calculating the slope. First look at a line that changes more in the X direction than it does in the Y:

<image of a small slope closup line>

From inspection you can probably note the slope on this line is 1 / 3. If we zoom in a bit we see this 1 / 3 slope holds as the line drops down 1 pixel in the Y direction every 3 pixels in the X.

To calculate the slope we would normally do something like this:

But instead let us treat the difference in y and the difference in x seperatly and call then ydiff and xdiff respectively.

For this case:

It might be helpful at this point to consider how we would draw the above line if we had infinite resolution.

We would first plot a pixel at x1, y1 then move one X to the right. If this were an infinite display we would also move 1/3 of a pixel down and plot the pixel. Because our real display is finite the best I can do is move one pixel in X and 0 in the Y. If I were to continue I would move another pixel in X and another 1/3 of the way down in Y. Again, because we have a finite display and 2/3 of a pixel is pretty meaningless we settle for no change in Y. Finally, as I move for the third time in X I reach a point where I should be a full pixel in Y further down and I can render at the correct location.

This process is the basis of Bressenham’s algorithm.

We keep track of this difference between the line we SHOULD draw and the line we CAN draw in an error term; when that error reaches a threshold we know we can correct by adjusting our Y value by one pixel up or down (down in this case). But what is that threshold and how much do we adjust the error term each time? Well, the easy answer would be to use 1.0 as the threshold and add the slope to the error each time. If we did this things would move along smoothly…unfortunately calculating slope requires a floating point division and tracking the error term would require even more floating point operations. Fortunately there is a simpler way.

We store the difference between x1, and x2 (xdiff) as well as the difference between y1 and y2 (ydiff). These two values will be proportional to the numerator and denominator of the slope. For instance, in the line depicted above the coordinates are (0,10) and (30, 0). This amounts to an xdiff of 30 and a ydiff of 10 (and a slope of 1/3 incase you have forgotten). To render the line we iterate in the X direction (because it changes more than the Y direction) and keep track of how far the line we are drawing is from the line we would like to draw.

We can use a threshold of xdiff and increment the error term by the ydiff (or the other way around for a line that changes more in Y than it does in X). In this case we add 10 to our error term each time we move in X and when it reaches 30 we move one step down in Y. We then reset the error term by subtracting 30 and continue on (notice we don’t set it to zero as in most cases the ydiff and xdiff will not align so well). If we continue we will draw a line at the correct slope from the two points.

That is the basis of the bressenham algorithm. We just keep going one direction building up an error term until the error term reaches a certain point, then we correct by incrementing the other direction and reset the error term by the threshold.

Unfortunately there is a bit more to it. As before we only handle the case where the second point is above and to the left of the first point and the line changes more in X than it does in Y. Fortunately handling these other cases turns out to be pretty easy.

Here is the final line algorithm, following will be a short explanation of the changes needed to make the code handle the other cases.

Notice we broke the cases where change in Y is greater and change in X is greater so we could loop through either X or Y accordingly. Also if the x and y values for the points were opposite from expected we just take the absolute value of the difference and change the x and y step so we step the other way through the line.

Now that we have an okay understanding of line drawing let us modify our drawing demo from before to connect the pixels we were drawing with lines. This should fill the gaps and make a nice smooth line as we trace around.

We make an old X and Y value to hold the previous position, grab the new position, and draw a line between them. Finally we update the old x and y and repeat. This code only draws a line if the pen is down for more than two frames because we need two points to draw a line. This is done by not drawing the line if the keysDown() touch is set.

When putting the code together, do not forget to #include <nds.h> and either copy and paste the DrawLine function above the main function or #include a header file with it defined.

[edit] Bliting Things

[edit] Drawing Pictures

[edit] Polygons

[edit] Circles

This code is in pascal, but some changes will turn it easily in c++ :

To draw a Circle

Fill the circle or remove this 4 drawing lines if empty circle

Don’t miss the end of the code

Posted by dovoto


This days content is meant to be read or even skimmed in a single sitting. It is not meant for you to understand or be able to use much of this information at this point but only to serve as an introduction to what is inside the DS.

Hardware Overview

The Nintendo DS is rich in features. It possesses one of the most advanced 2D rendering systems ever seen on a console system, abundant memory resources (many, many times that of the SNES), dual processors capable of outperforming the Nintendo 64 (floating point operations aside), integrated wireless networking, a modest 3D system with an easy to understand interface, a microphone, and touch screen.

What follows is a brief description of these features and a foreshadowing of the things you might accomplish with the knowledge gained in this guide.

Memory Layout

The memory footprint of the Nintendo DS is one of its more intimidating features for newly introduced console programmers. Understanding where memory is and what its uses are is key to getting the most from your applications and in many cases it is key to doing anything at all. Often a picture can be helpful in understanding how memory is arranged.

(Unless otherwise stated the data width for each bus is 16 bits.)

Below is an excerpt from gbatek showing a bit more detail in the memory mapping.

A few of these memory regions warrant further explanation.

Main Memory is the 4 megabyte block of RAM which generally holds your ARM9 executable as well as the vast majority of all game data.

Both the ARM7 and the ARM9 can access this memory at any time. Any bus conflicts are delegated to the processor which has priority (the ARM7 by default but changeable via a control register) causing the other processor to wait until the first has finished its operation.

Although it is possible to execute both ARM7 and ARM9 code from main RAM at the same time, devkitPro defaults to placing the ARM7 into the 64K of fast iwram for performance reasons. Official games generally place both ARM7 and ARM9 executables into Main Memory after which the ARM7 copies the majority of its own code to iwram..

ARM 7 Fast Ram (IWRAM)

The ARM7 has exclusive access to 64KB of fast 32 bit wide memory. It is this region that contains the ARM7 executable and data. When designing ARM7 code it will be in your interest to keep the binary small.

ARM 9 Caches

The ARM9 contains both a data cache and an instruction cache. Although the operation of these caches is a bit complex and really out of scope for this document a few things are worth noting.

Main memory is cacheable by default. This means all data and code being accessed from main memory will be stored temporarily in the cache. Because the DMA circuitry and the ARM7 do not have access to the cache often you will get unexpected results if you attempt to DMA from main memory or share data between ARM7 and ARM9 via main memory.

To help utilize the cache effectively the mirror of main memory that begins above 0x02400000 is not cacheable. There are also several functions provided by the library which allow you to flush the data cache and ensure main memory is in sync.

Although the cache adds a certain level of complexity its boost to performance is well worth this small inconvenience.

Fast Shared Ram

There are two small 16KB banks of fast 32 bit ram that can be assigned to the ARM7 or ARM9. Access to either block by both CPUs at the same time is prohibited. Commonly, both banks will be mapped to the ARM7 as they form a continuous block with ARM7 IWRAM effectively giving the ARM7 96KB of ram.

Video Ram

The Nintendo DS has nine banks of video memory which may be put to a variety of uses. They can hold the graphics for your sprites, the textures for your 3D space ships, the tiles for your 2D platformer, or a direct map of pixels to render to the screen. Figuring out how to effectively utilize this flexible but limited amount of memory will be one the most challenging endeavors you will face in your first few days of homebrew.

Below is a table of the banks along with a description as to what uses they can be put. You should not worry about understanding this at the moment but it might be handy to bookmark or print out for later use.

View large intimidating table

Virtual Video Ram

In order for the 2D systems to function they need RAM. One of the major differences between the 2D graphics engine on the Gameboy Advanced and those on the DS is the DS has almost no memory dedicated to the 2D system. Instead of setting aside a given amount of video memory for the 2D system it allows you to map the video RAM banks into 2D engine memory space.

This might be a bit difficult to grasp at first. An example might be helpful.

Scenario: You want to render a tile based map to the screen using the main 2D graphics engine.

Because you are an uber Nintendo DS programmer you already know two things:

  1. Where the 2D graphics engine expects the map and tile data to be
  2. What video RAM banks can be mapped to this “virtual” 2D graphics memory to hold your tiles and map.

Solution: Tell the Nintendo DS to map a video RAM bank to the right place…in this case we might map video RAM bank A (VRAM_A) to 0x6000000 for use as 2D background memory but we could have chosen another bank (turns out almost all vram banks can be mapped to main background memory).

We will revisit this topic when we create our first few 2D demos.

This might seem intimidating and difficult at first, but it does offer you a fair bit of flexibility and power over where everything is.

Sound Hardware

What would a game be without its compliment of blips, bleeps, and chip tunes?

Sound and music are an important piece of any good game and to ensure your next graphical adventure is accompanied by an equally astounding audio experience the DS comes equipped with some impressive hardware.

16 independent audio channels can pump digitized music in 8 bit, 16 bit, or ADPCM format. Each channel has its own frequency, volume, panning, and looping controls allowing for virtually CPU free MOD quality playback.


What is there to say but that it supports communication with 802.11 standard access points. A full socket library has been implemented which allows porting of PC network code to the DS.


User input is where the DS excels and is the basis for its much lauded inventive game play. 8 Buttons, 4 direction D-Pad, Touch screen, and microphone make for an interesting mix of possibilities.

Touch Screen

As I am sure you have already noticed the DS has a touch screen. It is very standard in operation and communicates to the DS via a serial interface to the ARM7. Using the default ARM7 binary from libnds causes the touch screen values to be read once per frame and reported to an area you can reach with the ARM9.

In the next chapter we will cover how to access the touch screen values in code.


Button presses are detected by reading registers on the ARM7 and ARM9. Some buttons are only detectable by the ARM7: the door open-close, the X and Y buttons, and the pen down “button” are all detected on the ARM7 and recorded in shared Main Memory for access by the ARM9.


Perhaps one of the most interesting additions to the Nintendo DS was the inclusion of a microphone. I have not played with it much to be honest but many interesting ideas come to mind and we will defiantly do a demo or two which captures input from the microphone.

Real-Time Clock

Being able to know what time it is to the second is pretty handy. Your game can respond differently based on the time of day, you can tell how long it has been since the player last played the game, or it can be used as simple in-game clock. And best of all, reading the date and time is a snap.

Upgradeable Firmware

The firmware on the Nintendo DS is stored in flash memory. It can be overwritten with custom firmware. For more information on how to achieve this check here

Upgrading the firmware is useful to developers because it allows you bypass the RSA check when downloading wifi demos. This means we can send our own .nds files to our DS via Wifi instead of just officially signed ones. Also, the hacked firmware will check the GBA slot prior to booting. If it finds an .nds file signature it will execute the code automatically eliminating the need to use a passthrough based device each time you wish to run code from your GBA cart-based flash cart.

If you currently use a passthrough device to boot your .nds files from the GBA slot, upgrading the firmware is an easy and relatively safe process.

Graphics Overview

Believe it or not, the Nintendo DS is a very capable very advanced graphics power house. It has an interesting combination of 2D and 3D rendering hardware.


The Super Nintendo is considered by many to be the best 2D console ever made (by many I really mean me…nobody else counts). SNES possessed a 16-bit 3.58Mhz processor, 128KB of 8 bit ram and 64KB of video ram. By comparison the Nintendo DS has a 32-bit 66Mhz processor, 4MB of main ram, and 656KB of video ram and that’s not counting all its little caches of fast 32 bit ram nor its second 33Mhz processor. This is a very capable machine.

There are two separate graphics cores on the DS. They are referred to as Main and Sub graphics cores. Each core has similar features which vary depending on their mode of operation. The major differences between the cores are as follows:

  • The main core has two extra modes which are capable of rendering large bitmaps.
  • The main core can give up one of its background layers to the 3D engine.
  • The main core can bypass the 2D engine and render from memory to the screen directly in what is often referred to as frame buffer mode.

As alluded to above, each core operates in one of several modes. Below is a table of these modes.

Graphics Modes

Main 2D Engine

Insert video mode table and description here

Text backgrounds are general purpose tiled backgrounds; rotation backgrounds are also tiled and can be rotated and scaled. Extended rotation backgrounds support a larger set of tiles (at the expense of a larger map), support more palettes, and can operate in a bitmap mode as well as tiled modes.

These modes and background types will be explored as we go along.


It may not posses the poly pushing, texture blending, hardware pixel shading capabilities of the current generation GPUs but where the Nintendo DS lacks in performance and eye candy it excels in features.

Limited to 6144 vertexes per frame (about 2048 triangles or 1536 quads) the 3D system might seem a bit sparse. But given the small screen size a lot can be done with even this small number of available points.

Hardware fog, lighting, and transformation along with non blending texture mapping, toon-shading, and edge anti-aliasing make up a rather impressive set of features for an otherwise lackluster 3D machine.

The 3D core operates as a very openGL like state machine allowing much of its functionality to be wrapped in gl compliant code. One major difference between open gl and the DS core is the absence of floating point number support. All operations on the DS are carried out in fixed point precision.

If you want to get a jump on 3D look at the 3D examples included with libnds and the NeHe tutorials the source code originated from.

Toolchain Explained

Understanding how your code goes from being a text file to being executed on the Nintendo DS will become very important as your projects progress in complexity. To aid in that understanding we are going to recreate Demo 1 from scratch and build it step by step from the command line. This section is not for the faint of heart and can safely be skipped until such time as you find yourself wondering just how your tools are building something which runs on the DS.

Before we begin there are a few terms you are likely familiar with but I feel necessary to go on about anyway.


The compiler is the first tool you pass your C source through. It is responsible for interpreting that code and translating it into machine based assembly language. From there the assembly language is further reduced into its binary machine code equivalent by another tool known as the assembler (which the compiler will call for you).

The output of the compiler is generally not executable but is instead in what is known as an object file format. Although the instructions have been translated to machine code binary, the decisions about where that code and associated data is to be physically placed in memory have been left undecided.


The tool used to combine the object files and determine physical addressing such that functions from a multitude of object files can operate in a coherent fashion is called the linker. By passing your object files to the linker you can produce an executable binary file.

Because the linker is responsible for determining where things should be placed physically within the NDS system it must be told a fair amount of information about the memory layout of the DS. This description is located in a linkscript file which describes both the memory layout of the DS and the way in which we want the different regions of our code to map to it. Here is a small piece of the devkitARM default linkscript for the arm9 (yes there is a separate one for the arm7 since it has a different memory layout).

There is much more to this script; most of which is utterly incomprehensible and any of which can have extremely difficult to understand consequences if muddled with. It is good to understand these scripts do exist and their general purpose, but actually editing them is well beyond the scope of this document.

The snipit above was chosen because it is somewhat comprehensible; it describes the 4 primary memory regions of the DS that will be used to contain code and data.

  • rom is the GBA cartridge space; it is 32MB in size and begins at an absolute address of 0x8000000
  • ewram is external working ram and is the slow 4MB of main memory for the DS.
  • dtcm stands for data tightly coupled memory and is a special area of memory on the ARM9 intended for use as fast data memory. The standard link script places the stack in this area. dtcm is a mere 16k so be careful with those local variables.
  • itcm stands for instruction tightly coupled memory and is another special area intended for use as fast instruction memory. This area is 32k and may be used for small functions which need to be fast. libnds uses this area for the interrupt dispatcher.

The rest of the script file deals with mapping of your code to these regions (read only data, code, variables, stack, global data, constructors for C++ stuff, etc). It does this using an obscure and rather intimidating expression type language that I will not even pretend to understand completely. Fortunately a few people like Jason Wilkins, Jeff Frohwein, and most recently (and most successfully) Dave Murphy have done the grunt work; what was once something which caused countless "interesting" issues can now be relied upon confidently to just work.

Generally speaking there is no need to worry about the linkerscript unless you have some pressing need to change where things go in memory. Everything except the stack goes in main memory by default so all you have to worry about is fitting everything into 4meg.

Build A Demo The Hard Way

To sum things up you first compile your source files into object files and then link them into a binary executable. Normally this would be the end of the process but, alas, our little DS is a bit more complex as it has not one but two processors and each need their own separate binary executable.

What are we to do? Create both of course. The process is identical with the only difference being we use the linkscript for ARM7 when linking the arm7 object files.

The final step in the process is packaging the binaries into a single file that we can then load onto the DS. Fortunately a tool is included in the devkitARM package which does just this. Official NDS game carts happen to use a file format which suites our needs rather well. This format includes a small header with an embedded logo and a short description of the .nds file in several languages, followed eventually by the executable binaries for the arm7 and arm9. The included logo and description text shows up when you boot a game card from the firmware or start to download one over wireless multiboot. (After a bit of investigation it turns out the logo is not actually embedded in the header but exists separately…it is however pointed to by the header)

Now that we have a feel for the process let us create a full .nds file from the command line (so we can confidently never do it again).

Okay…one quick thing to mention. When we created demo1 you may have noticed we had no arm7 code in the project. The reason we are able to get away with this is there is an arm7 executable binary already present in the devkitPro package that you can include by default. This binary performs some very basic things such as read the touch pad and real time clock as well as some very simple sound playback. For anything more advanced you will be providing your own arm7 code or at least modifying the supplied code.

Now…on to the demo. Follow the instructions from Day 1 on building your first demo with the following exception: Instead of grabbing the arm9 template grab the template labeled combined. Within you will find an arm9 folder and an arm7 folder. Replace the code in the template.c inside the arm9 folder with the demo1 code (or leave it the same as it does not really matter what we compile for this exercise).

You will notice a makefile inside the combined directory. If you were to navigate to that directory in a Dos/terminal window (try this now in fact), you could simply type make and the scripts contained inside the makefile would do all the steps we are about to do by hand.

Here are the commands needed to build the nds file from the command line.

Before we explain what is going on it is best to take a moment and absorb just how much compiling from the command line sucks…

Good, now that we know let us figure out what we did.

What is not shown is me setting my path so that the devkitARM tools could be found. On windows this is simply:


First we invoked the compiler on the arm9 template.c file. This translated it into an object file (template.o). We passed it the file as an argument as well as the include directory for the libnds header files we are using. Because libnds does different things depending on if you are constructing an arm7 or an arm9 binary we must define ARM9 with the –D option.

Next we link the file into an executable (.elf file). We pass the object file as an argument, we tell it we would like to include libnds (-lnds) and then we tell it where to look for the linkscript and what default libraries to use (-specs=ds_arm9.specs).

Because our loader does not handle the .elf format very easily we strip away all that extra info using objcopy. This leaves with a nice flat binary for execution. (the .elf file contains debug information and other things which are useful; you will need the .elf file to use the remote debugger or the source level debugger in no$gba if you can afford such luxuries).

This entire process is repeated for the arm7 leaving us with an arm7.bin and an arm9.bin. We next combine these binaries into an .nds file using ndstool.

If there is anything you should take from all this it is the convenience of the template makefiles. All you do is drop .c/.cpp/.s files into the source directories for the processors, .h files into the include directories and .bin data files into the data directory (more on this when we talk about getting your data into your program) and type make. The script will automate this entire process in an efficient and easy to use fashion which reduces the entire painful process of above into the single command: make.


Today we took a short peek at the capabilities of the DS and learned a bit more detail on the process of creating executable from code. Much of the hardware descriptions were intentionally vague as the real detail will come in the following chapters.

Tomorrow we will begin looking at the hardware in detail when we explore raster graphics.

Posted by dovoto

Who Should Read This?

This tutorial is geared towards people with a moderate understanding of the C language, an interest in programming games or applications, and a love of Nintendo’s DS family of handheld consoles. These are the only skills required.

The end goal of these serial endeavors is to leave you with a broad understanding of how to talk to the DS. We will not be making a game nor doing any complex projects. In fact, all of the example demos created along the way will likely fit on a single page.


Here is the very short list of things you need to get started:

  • A PC (or a Mac or whatever you linux freaks call a linux box)
  • devkitARM
  • A Nintendo DS with some way to load homebrew

In fact, you don’t even need a DS. There are software emulators which will run your code like its the real thing and they are even fairly accurate these days. But lets face it, half the point is being able to see it run on the real hardware and there is something to be said for testing on an actual system (a lot to be said).

Why The Nintendo DS?

There are many reasons the DS makes a great game development platform for amateur developers. The first is its amazing array of features. Dual screens, dual processors, touch screen, microphone, wifi, powerful 2D graphics engine (two of them actually), wonderful 3D hardware that is limited in performance but rich in abilities. So many things for homebrew developers to explore…this makes programming the DS very satisfying.

Although one could argue that a PC also has these features, the DS has them all standard. You know exactly which hardware you are going to be developing for and a direct channel to it (no silly operating system always trying to get in the way). And let’s face it, it is much cooler to show a friend some code you have running on your handheld Nintendo DS than on some computer screen.

Another reason the DS is a viable platform for us is the homebrew compiler tools and libraries are solid, well-maintained, and very easy to setup. Development on the DS is now nearly as simple as development on a PC thanks to people like Dave Murphy.

What You Need To Know

C or C++ experience would be very beneficial as would any PC game programming knowledge. Some might disagree, but the DS console is a great place to learn game programming, even if your only experience is with a few lines of QBasic code or a Java class in high school. If you have never written C/C++ code before you may want to look about for a few Internet tutorials or a decent C book before adventuring into the console programming world. But I wont require you to be a guru in order to keep up.

If you don’t know C and don’t feel like finding a good book or taking a class then I urge you to at least wander over to Beej’s Guide to C Programming and give yourself a few days with it. Play around a bit and write some code. Don’t get discouraged if you don’t get it all… it takes a number of years to get really good with C but if you find yourself hating the process then perhaps you should look for other hobbies.

Installing your tools (Windows)

You will need a compiler toolchain to build your DS applications. You will want a C/C++ compiler, a linker, libraries, and header files as well. The compiler package – a custom build of GCC — is maintained by Dave Murphy and can be found at

On this site you will find a devkitPro package for Windows, Linux, and OSX that contains not only the compilers but also the homebrew header files, libraries, and examples you will need.

Installation is a snap. If you are on Windows (preferably Windows 2000 or newer) you simply need to run the installer. On Linux or OSX just follow the guide (this involves unzipping the package to a folder of your choice and setting 2 or 3 environment variables). Dave does a much better job explaining the installation than I could so follow his setup instructions. also has a FAQ section which addresses any issues you might encounter as well as how to integrate the toolchain with programming environments such as Visual Studio and Eclipse. For the remainder I will assume you have followed the instructions on that site and have a similar directory structure to the one below:

The libnds examples are installed automatically with the toolchain. Many of the examples will be used in these tutorials and FAQs.

Installing your tools (Mac OS/Linux)

this section is a bit dated….I recommend just following the getting started guide on devkitPro which points you to some install scripts.

Building A Demo And Testing The Installation

devkitPro supplies about 30 examples with devkitARM that demonstrate how to use the tools and do certain things on the hardware. To test your installation of the toolchain navigate to one of the example folders in a terminal or DOS window and type make.

If you are using something other than Windows you may need to grab these examples as a separate download.

If you get no errors you will have a .nds file in the root directory (if you do encounter errors check the FAQ section of and feel free to post your issues on as well). You can run this .nds file on an emulator or directly on a Nintendo DS.


At this point, if you have not already, install desmume. It is actively developed and has lots of nice features for debugging.

Some other emulators

  • Dualis
  • NO$GBA
  • Running The Demo On Hardware

    todo: add some info about flash cards…maybe even find an affiliate?

    First Demo
    Our first demo will be a simple one. We will initialize our DS and paint one of the screens a wonderful red color.

    First copy the template folder from the examples pack installed with devkitPro. For me the template is installed at C:\devkitpro\examples\nds\templates\arm9 but your installation directory may vary.

    Copy the entire arm9 folder to a new directory (one without any spaces in the path in case your OS allows such things). Name the folder something useful like HellowWorld. Once copied navigate to the folder called “source” and open main.c (or main.cpp or template.c, there is generally only one c source file in each example or template and they are not terribly consistently named).

    You can open this source file in any text editor, later will talk about setting up some more productive editors but now any notepad, vim, visual studio, or code blocks sort of editor will do.

    Delete everything then type the following:


    Now navigate to your demo’s main directory in a dos or shell window and type “make”. Your demo should be built and you should have a nice HelloWorld.nds file you can run in your emulator. It should look like this:

    Congradulations! You have build your first DS demo…now we just need to figure out what is going on. We are not going to discuss the code in this demo today…we have to leave some secrets for tomorrow….

    Posted by dovoto

    In this example we are going to combine windowing and interrupts to show how to mask non rectangular portions of the screen. Because this is an introduction to such effects we are going to aim low. How low? We are simply going to attempt a spotlight effect (a triangle really…okay so a trapezoid really really) that shows us our background through an altered window.

    Thanks to Henke37 for submitting the original example. Even though I rewrote his code for the sake of simplicity I did steal some of the interesting bits.

    Before we get to the details lets take a moment and look at the entire source code for the project:

    This example isn’t really about windowing nor even about HBlanks, but since understanding both is critical we shall briefly review these topics.

    The DS has 3 hardware windows which can be used to mask parts of the screen. Two of them are simple rectangles, you give the DS the top, bottom, right, and left of the square and it creates a mask of that size. We can control what layers are rendered inside the window and which ones are rendered outside the window.

    The other window is an Object Window and it uses a sprite object as the mask (basically any pixel in the sprite that is not zero becomes “in window”).

    All that is required to use a window is first to turn it on, second to define if you want your layers or objects to appear inside the window or outside all windows, and finally define the top, left, bottom, and right positions of the window..

    The first bit of code sets up our video mode and background as is standard for displaying a bitmap background:

    Now that our background is loaded we must turn on our window and tell the DS we want our background to appear only “inside” of it. If we neglect this step it will not render our background at all.

    A layer can either be inside a window or outside all windows.

    Notice we don’t set a bounds on the window yet. We will do that in our interrupt handler…lets take a look at the first bit of our interrupt code:

    Remember that the DS renders the screen from left to right, top to bottom drawing one horizontal line at a time. In between each line there is a brief pause where we can adjust things that we want to change per line. In our case, we want to adjust the left and right bounds of our window.

    The two lines above install our interrupt handler for the horizontal blank period and then turn it on. Hopefully, this is somewhat self explanatory. If not take a look at the irq examples (which are likely not written yet).

    And finally we get to the bit you have all been waiting for, the interrupt handler: hblankIrq.

    Okay, you may not have seen the ‘REG_VCOUNT’ register before. It is a pretty useful hardware register that keeps track of what horizontal line the DS is currently rendering…just for times like this. We can use it to figure out where we are in the render process. The first bit of math calculates how far from the center of the open window we want the edge of the window to be…by increasing this with each line we make our window wider as we move down.

    Take a moment to figure out the math by stepping through a few values of REG_VCOUNT to convince yourself you understand it.

    Normally a divide would be ill advised in an hblank interrupt. There is not much time between horizontal lines where the graphics hardware is idle…and division takes a long time on the DS (the DS CPU has no hardware divide and so must do it in software, there is a co-processor but that’s a story for another day). In our case, our hblank code is simple enough we can get away with it. Without the divide , the spotlight would widen much too quickly and doing it properly would cloud the code.

    Once we have calculated the current left and right values of our window opening we set the window bounds using the windowSetBounds command.

    The final bit of code is the adjustment of center in the main loop. This should be rather self evident:

    That’s about it, feel free to drop questions below and I will beef up the explanation as necessary.

    Posted by dovoto

    Below is a quick video demonstration of using Pern Edit map editor to create a 3 layer background. The source used in the demo is provided.

    The following video shows the creation of a simple 3 layer map from downloading and installing the application to writing the DS code to display it.

    Get Pern Edit

    Posted by dovoto

    This is all that is required to print the words “hello world” to the screen:

    Posted by dovoto

    The new site is up…perhaps I will load some content?

    Posted by dovoto
    Featured Video