Project Vol

Friday, 8 April 2011

Parked!

Unfortunately, due to other commitments, we've had to park this project. I'm sure that existing and new projects in the genre will enjoy good success and support, and we'll revisit this another time.

Monday, 28 February 2011

All Quiet for now, but we're working in the background

These blogs tend to look like a one-shot wonder if they start with many posts, and then suddenly stop, so this is just a quick update.

We've been looking far ahead and scoping out the rest of the project. A lot of work has been done on the gameplay mechanisms and balance (but we won't be revealing that until the very late stages, because we think it's going to be good, and worthwhile stealing!). We've also looked extensively at rendering, in particular how the view is prepared for a player, and the rendering of different types of game object in the same view (for example, mixing custom rendering with conventional hardware-rendered 3D geometry).

So although this blog will seem to slow down, be assured that we're doing lots of planning in the background. When we're certain of a design direction, or when we're confident we can write a balanced article about design choices, we'll post more.

Tuesday, 15 February 2011

Stripping the Zeros

This implements nodes of a sparse voxel octtree (SVO), with optimizations that enable optional attributes to be stored frugally.

SVOs are fairly efficient at storing volumes where most of the voxels are identical for large chunks of landscape, but when we try to associate different types of data as a 'variant' or 'union' record in each node, the structure suffers from unnecessary bloat because variant records take the length of the largest variant; the unused fields still take memory.

Our code identifies a small number of 'parts' that may be included in a node. If we have three parts, then there are eight possible composite combinations of these parts (by binary combination: {fff, fft, ftf, ftt, tff, tft, ttf, fff} ), and each of these composites may have a different record length.

If we then store each combination (variant) in its own array or list, we avoid the wastage of unused records. The best improvements are found in scenarios that mix long records with short records.

Cons: it's possible that fragmentation can occur with the reallocation of memory when up-sizing a list. We have not addressed this potential problem, which may disappear with the virtual addressing that exists in most modern OSs, or be solved by adopting a simple hierarchical memory manager for the list or array.

Example

In the following, we present a simple example of the technique, which uses three attributes. Note that our actual code will include more structure, and some details are omitted:

Basic (TCNV_Part_Basic), which contains:
- a pointer to the parent node,
- information about which attributes are set in this node (remember, just 1 byte!)
- information about which attributes are set in sub-nodes (also 1 byte).
Material (TCNV_Part_Material), which contains:
- just a reference to an entry in the list of materials.
Children (TCNV_Part_Children), which contains:
- 8 × octtree pointers to sub-nodes. Null values mean that there is no sub-detail in that sub-volume, and the volume is accurately represented by the current node.
- ...[and so on]...

Any data that is missing from the children may be obtained from parent nodes. We note which attributes are stored in the node using a set (a row of bits, of type TCubeFeatures where each bit denotes whether the attribute is present).

These are typed, fixed-length records, named with prefix "TCNV_Part_" to denote "Type: Cube Node Variant - Part"

 TCNV_Part_Basic    = record
  Parent:           PCubeNode;     // 32 bits
  FeaturesThis:     TCubeFeatures; //  8 bits
  FeaturesChildren: TCubeFeatures; //  8 bits
 end;
 
 TCNV_Part_Material = record       // 16 bits
  Material:         TMaterialID;
 end;
 
 TCNV_Part_Children = record       // 8 × 32 bits
  ChildNodes:       array[TUnitDirectionEnum] of PCubeNode;
 end;

Code 1: 'Part' types.

These types are then combined into composite records:

 TCNV = record                   // 48 bits
  Basic:      TCNV_Part_Basic;
 end;
 
 TCNV_M = record                 // 64 bits
  Basic:      TCNV_Part_Basic;
  Material:   TCNV_Part_Material;
 end;
 
 TCNV_C = record                 // 244 bits
  Basic:      TCNV_Part_Basic;
  Children:   TCNV_Part_Children;
 end;
 
 TCNV_MC = record                // 304 bits
  Basic:      TCNV_Part_Basic;
  Material:   TCNV_Part_Material;
  Children:   TCNV_Part_Children;
 end;

Code 2: Composite types, which combine the Part types.

...each of the Composite types can be stored in arrays where the elements are of fixed length. With sensible re-sizing of the allocated memory, we can minimize the amount of memory that would otherwise have been wasted by conventional variant records. In Code 2 (above), the largest record is 304 bits (38 bytes), and the smallest record is 48 bits (6 bytes), which gives a 'saving' of 256 bits (32 bytes, or 84%) for every 'leaf' node.

Because the data types and field offsets are different in each case for any given field (like 'Material' for example), we'll need to wrap these structures in methods that allow requests to be mapped into our non-standard variant structures.

It will be interesting to see whether this works in practice. If it does, then we'll be able to add many types of sparse data to existing matter in the world model, with very little penalty. This is how we'll inject lighting data into our world.

Further Optimizations

This is getting ahead of ourselves a little, but we should mention that the above technique still has lots of room for improvement. Further optimizations are possible by abstracting the subdivision and material packing, which we might need for small-scale detail, to bring massive efficiencies in specialized cases.

Material Packing

It's likely that fewer materials will be used within a player-sized cube, so we might be able to (for example) optionally treat sub-detail as an 8 × 8 × 8 array of bits, value 0 representing 'background' material, and 1 representing 'foreground' material. For any of you who know 8-bit computers, some computers (the ZX Spectrum, and the BBC's Teletext mode) allowed the creation of primitive graphics by allowing each 'text cell' to contain a 1-bit image, mapped into foreground and background colours.

Subdivision

At the expense of complicating all the supporting ray-casting routines, we can specify alternate ways of slicing up volume nodes into sub-volumes. It's well-known that octtrees can be very fussy about the placement of a plane surface: if a change of medium corresponds to the boundary of a large node then it is very efficient, but if that plane is just one detail voxel inside the neighbouring large node then the neighbouring node is sliced eight ways for each level of detail until the detail node is reached - very inefficient, especially if your plane happens to be a flat sea or landmass covering the whole map (it can make megabtyes of difference to the storage overhead on a 1024^3 world map).

Edited to provide more detailed examples. We'll revisit this entry if these methods prove to be successful.

Rewind: Some points from our research

There are lots of projects out there that are playing with voxels. Only a few have achieved demonstration state, and most of those seem flawed in some important way. Minecraft seems to be the cleanest implementation so far. I tried running Manic Digger, but it suffered performance problems, kept eating memory to fill 2GB, then hung - looking at the source code, it doesn't use SVOs (the special Octtree structure I worked out before the first Vol post), but instead caches uncompressed chunks of the landscape.

From my basic research, I learned that one needs:

OctTrees (or similar) to store a 3D landscape of voxels.
To discard parts from memory when they're no longer needed.
A procedural landscape generator, which is repeatable, and doesn't depend on what's going on elsewhere - this is limiting, but I can see ways around it.
[inferred] Given that the procedural aspect can be generated at any time, we have freedom to discard it as required, because we can always regenerate it on demand. We want a player to be able to dig and build in the landscape, so we'll need to store modifications in a separate octtree. The actual state of the landscape can be ascertained by reading the procedural content, and then overlaying the modification content. Modifications can be paged out as necessary, but never deleted.
Using conventional (straightforward) programming, adding attributes to voxels will drastically increase the memory usage for ALL voxels. We need to find a way around this (and we think we have).
People seem to think that a multiplayer is much more fun, and most developers have prioritied a networked client-server topology, though I've only seen 'shared enviroments' rather than any true player-to-player interaction or co-op missions. I think I'll ignore this for now, but try not to close the door to this option.

Getting There Quickly

World Geometry

Step 1

As we've outlined in our earlier post on abstraction, these are the options that we'll be choosing to implement first, for renderer and world geometry respectively:

Low-resolution software rendering,
A simple if-then-else function to determine the contents of the world geometry, e.g. if (position in [0,0,0],[10,10,10]) or (position in [-30, -5, -30],[30, 2, 30]) then [this is rock], else if (y<=0) then [this is sea], else [this is air].

This allows us to develop player and camera movement, and basic interaction with the world geometry.

Step 2

From there, we'll progress to higher resolution rendering (only where it maintains playable rates, and perhaps using some optimizations to achieve higher resolutions without doing too much work), and a regular array of cubes for the world geometry. Those optimizations can take on a life of their own, our favourite methods being 'local ray cacheing', 'progressive rendering', and picture processing operations (borrowed from video encoding) that avoids re-casting predictable regions for a few frames. We'll try to remain focused!

Steps 3, 4

That gives us some options for terrain modification in step 3, which is a single octtree, along with optimizations that minimize memory usage (more on that later). Step 4 gives us the infinite world using side-links to other nodes, along with the memory management utilities that enable the disposal and paging of world geometry.

Rendering

As we hinted before, there are many possible ways to render the scene. We'll concentrate our efforts on raycasting and polygonal rendering, and the hybrid methods anywhere in between. Our strategy for rendering will be to evolve through several stages, each stage being more polished than the previous stage:

Low-resolution software renderer.
Simple triangle geometry.
Optimized triangle geometry.
Optimized software renderer.

Hardware Rendering or Software Rendering?

We believe that writing a full software-driven renderer is not likely to result in a playable frame-rate; indeed to do so would need a lot of low-level optimisations that (frankly) we don't have time to implement. We'd rather get the other elements of the game working, and perhaps return to writing and heavily optimizing the rendering engine later.

Aiming for a Hybrid Approach

Our hybrid approach will be to analyse the world geometry for visible blocks using ray-casting (also face culling, frustrum culling, and outsideness testing), convert those to triangle geometry, and render the triangles.

We think efficiencies can be gained by limiting the amount of ray-casting that we do, using a heirarchical scanning technique. We start by scanning a few points, which correspond to a few widely-spaced point on a grid placed on the screen (the projection plane of the camera's frustrum). Say our screen has 320x200 pixels (that's side-screen, 16:10); we might scan every 16th pixel. This image by itself won't look good, but we can use its data to smartly render the remaining pixels. We do this by interpreting the rays from the coarse grid to decide whether to omit the other rays.

Here's an example: if we look at four rays, cast from a point, through viewing plane in a square formation, we can calculate, for any given distance from the origination point, the largest size of cube that can be enclosed by those four rays, without any of the rays passing through the cube, i.e. all four rays pass around the cube but not through it. What we're doing here is calculating the resolvable real-world resolution of a raycasting grid. It tells us that, for example, the coarse grid will start to miss cubes at a distance of 3 metres from the camera. If we're using Octtrees, then we can obtain a 'level of detail' that a grid can resolve at any given distance, and use that to limit the work that is needed by each ray-casting operation.

This becomes exploitable when a square of four sample points from our coarse grid all hit distant scenery: it tells us that between the camera origin and the 'resolvable distance', there will be no obstructions, because they are big enough to get in the way of at least one of the four rays that we cast using the coarse grid, and we exploit this by not needing to test anything in this span, choosing instead to start the rays from our fine-grid at the resolvable distance instead of starting them from the origin.

There are many variations on this, which can exploit different situations, which can tell us: when we'll hit a cube, when we're likely to hit a cube, and when we're likely not to hit a cube. The most useful optimization will help us tune the level of detail according to the placement of geometry found in nearby rays.

Prototyping

Perhaps the fastest route to prototyping is to write a basic software renderer. We said at the beginning that this wouldn't be practical, but I think it will be useful to have a low-resolution renderer, because it is simpler to write than an optimized triangle pipeline.

The first rendering step is to throw some rays out from the camera position into the scene, and put coloured pixels on the screen corresponding to what those rays find. We choose which materials on the world to regard as transparent, and only return a colour when we hit something that is not transparent. If the ray travels straight through all available data without hitting anything, or reaches a distance that we regard as 'too far to display', then we return a sea or sky colour depending on the direction we're looking in.

Implementation - the start of coding

We've chosen to use Lazarus!

After looking around at several IDEs, languages, and platforms, I've decided upon using Lazarus + Free Pascal. I've been using an old Delphi for many years (and completed projects using it), so Lazarus offers a fairly smooth transition to a more modern IDE (compared to Delphi 2 and 6). It's cross-platform too, working on Linux, Mac, and Windows - my main OSs are Windows and Mac.

I'm not choosing Lazarus simply because it's free and a comfortable transition. I've looked at Java (+JMonkey, etc), JavaScript+HTML Canvas, conventional C/C++ + SDL + GLUT and other libraries, C# + OGRE, C# + XNA, Delphi + GL Scene, and others that I can't remember. I think my best chances are with Lazarus, assuming the compiler generates fast enough code, the IDE is stable enough, and the common APIs are well-supported.

Coding progress

At this point in coding, we've prepared about five units, to various stages of completion. Starting from high level, and working down to low level: Main, Game, World, Materials, and Measures. The Measures unit defines math structures, and sets up any global constants. More units (like View for graphics, Client and Server for command and state processing, Net for networking, and so on) will be added.

In World, we've coded most of the data structures for the first three abstractions for the world geometry: a function-based world, a 3D array world, and an OctTree world (more on that in the next post). We've yet to write the internal utilities and published methods that will query and manipulate the world geometry, but we'll will write a complete set for the simplest method first, and when that's working, we'll work on upgrading them, behind an abstraction switch.

Monday, 14 February 2011

Abstraction

Why this approach?

A quick definition of abstraction is "creating a symbol of something, and hiding its details". The abstracted process is reduced to a descriptive symbol, and is handled in terms of that symbol, rather than having to deal with the complicated internals of the process.

This is usually a sensible approach in programming, because it makes the individual modules more robust: parts that shouldn't be visible to the rest of the program are kept secret, and communication between modules is strictly controlled which improves the quality of the resulting program. Abstraction makes us think about how the modules will appear to a calling program, and define a minimal interface that we can implement in many different ways. In theory, by implementing an interface, we may slot in any compliant module, and the module itself takes care of all the details that make the module's functionality possible.

We want the option to be able to get running quickly, by implementing features at a basic level, and then develop more complicated versions later. In programming terms, we're using 'stubs with switches' that let us choose between ways of implementing the functionality of the stubs. With this type of abstraction, we aim to be able to switch between alternative methods by just 'throwing a switch' by setting a compiler flag, or setting an in-game option.

Tighter Development Cycles

Doing this means that we can quickly build a test environment, upon which we may build more detailed functionality, and commit to short development cycles because there is less code to 'get right' before the individual elements work as intended within the whole. Additionally, there is less chance of being distracted with details that are not relevant to the core game. This is especially important when we're just trying to see if the game concepts themselves will work (it's always best to reveal potential feasibility problems as quickly as possible!)

Major Abstractions

World Geometry

We aim to play the game in potentially massive worlds, but the programming to enable a very large or seemingly infinite world will take some time. If we implement the simplest structure for the world geometry, then we'll be able to quickly write the other aspects of the game. Given that the world geometry will take up lots of memory in this simple form, we'll be limited to small worlds.

Closely associated with the world geometry is the memory management and streaming. Whether the world is stored as a 3D array of cubes or an octtree structure, there will be a number of functions that are common to both approaches, which can be used by the other parts of the game.

For example, we might want to access the block that is adjacent to the block we're currently working on, so we'd write a function to do this called GetAdjacent(). To do this, we first define the common interface (the specification that the rest of the program will use to access the geometry), as TGeometry (we're using Pascal/Delphi notation, the "T" prefix indicating a type, so this name means "this object/class is a geometry type"). Then our alternate approaches inherit from the TGeometry abstraction: TGeometryArrays, and TGeometryOcttree, which will both have a GetAdjacent(direction) function, but will implement them differently internally, because they don't use the same internal structures.

We'll also need to specify some storage methods, ready for an 'infinite worlds' model. These methods allow the game to 'generate', 'forget', and 'remember' various parts of the world as they are needed. Conventionally, this is called 'streaming', and if we do this well, it will ease the implementation of client-server networking (should we choose to branch that way).

Rendering

There are lots of ways that a world can be rendered from its data. One approach is to convert the data into conventional triangle geometry, and render it like most conventional games would. The other approach is to ignore the conventional rendering pipeline, and write a renderer that casts its own rays into the scene, and writes pixels to the display based on what is found by each ray (there are hardware implementation of this approach, but they're mostly proprietary, and we don't have the resources to implement another abstraction level to cater for all manufacturers of graphics hardware).

We'll be doing a lot of trail-and-error testing here. If performance is good enough, it's likely that we'll adopt a hybrid approach, using low-resolution ray casting to query the world geometry to generate a optimized display list of conventional triangle geometry, and have the hardware do what it's been finely tuned to do: render triangles.

Wrapping up

I hope that provides some insight into the abstractions we'll be using, and reveals some of our feelings on what we think will work well. In further blog posts, I'll describe some techniques in more detail, and perhaps touch upon gameplay.