MESYETI's Website

Day 10 - Implementing vectors in Callisto

My December Adventure log is back, after 5 days of nothing. I started implementing vectors (dynamic arrays) in Callisto. It went horribly wrong as many Callisto bugs started to show themselves. It took a while, but eventually they were all fixed. Now to talk about vectors.

Up until now, Callisto has been about as low level and as difficult to use as C. C doesn't make using arrays on the heap that easy, and neither did Callisto. In C, this usually makes you end up just using fixed size arrays where you can, which is good as they are faster than dynamic arrays. However, if you really need a resizable array, you might get lazy and make a slow implementation without things like array capacity (which means less realloc calls - I'll explain later). Not that this matters anyway, since Callisto is pretty slow (it's not too bad though!)

How vectors work

Vectors are just arrays with a variable size, so they need to be allocated on the heap. Re-allocating every time you add something to this array can be slow - so it's good to use something called capacity. This means pre-allocating some big (not too big) amount of data in case stuff needs to be added. So, whenever something is added, instead of re-allocating, simply increase the length value and write to the already allocated memory. If there is no more capacity left (the vector length is equal to the vector capacity), then re-allocate. This means re-allocation happens much less often.

With this in mind, I can start off my vector implementation like this:

struct Vector : Array
	usize capacity
end

Array is a built-in structure in Callisto. It has a definition like this:

struct Array
	usize length
	usize memberSize
	addr  elements
end

Something I like about object orientated languages is that they usually have constructors or destructors for their classes. Things are easier when you don't have to be worrying about freeing things when they go out of scope. That's why I added them to Callisto. Here, I'm going to use them to initialise the values in the Vector structure.

implement Vector init
	let ptr Vector vec
	-> vec

	0 -> vec.length
	0 -> vec.memberSize
	0 -> vec.elements
	0 -> vec.capacity
end

implement Vector deinit
	let ptr Vector vec
	-> vec

	if vec.elements 0 = not then
		vec.elements free
	end
end

Setting the values in the vector is important because of the destructor. If the elements value is not initialised - then it has some garbage value in it. An empty array should be NULL. This is important for the destructor because freeing a garbage address will cause a fatal runtime error.

However, this initialisation is not enough. The structure doesn't yet know about the type it will hold. The user will give this information to the structure with the init_vec function:

func init_vec usize typeSize ptr Vector vec begin
	typeSize -> vec.memberSize
	1 -> vec.capacity
	0 -> vec.length
	vec alloc_vec_capacity
end

It also initialises the vector's capacity to 1. The alloc_vec_capacity function simply calls realloc with the vector's element address and the vector's capacity.

Now that we have everything needed for the Vector type itself - next a way to add data to the vector is needed. I will add a function to push values to the array:

func man vec_push cell value ptr Vector vec begin
	let ptr Vector vec
	-> vec

	if vec.length 1 + vec.capacity > then
		vec double_vec_capacity
	end

	vec.length 1 + -> vec.length
	vec.length 1 - vec a!
end

The double_vec_capacity function just doubles the vector's capacity and re-allocates. I'm not sure if doubling the capacity is always a good idea (you could run out of memory if you have an enormous amount of elements), so at some point I'll check other implementations to see what they do, but this should be fine for now.

Now to make a function that does the opposite, pops from the top of the vector.

func error vec_pop ptr Vector vec begin
	if vec.length 0 = then
		"vec_pop: Vector empty" throw
	end

	vec.length 1 - -> vec.length

	if vec.length vec.capacity 2 / < then
		vec halve_vec_capacity
	end 
end

halve_vec_capacity does the opposite of double_vec_capacity. It halves the vector's capacity and re-allocates.

2 more functions to go until I would consider this vector implementation usable. Remove and insert. vec_remove will remove an element at any index from the vector. First, all elements after the element to be removed should be moved to where the element is. So, removing index 1 from [1, 2, 3] will result in [1, 3]. The length is still the same, so just subtract 1 from it.

func error vec_remove cell index ptr Vector vec begin
	if index vec.length >= then
		c"vec_remove: Index out of bounds" throw
	end

	if index vec.length 1 - = then
		vec vec_pop
		return
	end

	index vec a^ # dest
	index 1 + vec a^ # src
	vec.length index - vec.memberSize * # n
	copy_mem

	vec.length 1 - -> vec.length

	if vec.length vec.capacity 2 / < then
		vec halve_vec_capacity
	end 
end

Insert is the opposite of this. It takes 2 parameters. First is the index to insert at, and second there's the value. The first thing to do is to move all elements from index to the end to index + 1. This leaves a space for the inserted element. After that, you simply write the inserted element there and increment the length.

func error vec_insert cell value cell index ptr Vector vec begin
	if index vec.length > then
		c"vec_insert: Index out of bounds" throw
	end

	if index vec.length = then
		value vec vec_push
		return
	end

	if vec.length 1 + vec.capacity > then
		vec double_vec_capacity
	end

	index 1 + vec.memberSize * vec.elements + # dest
	index vec a^ # src
	vec.length index - vec.memberSize * # n
	copy_mem

	vec.length 1 + -> vec.length

	value index vec a!
end

That's the end of my vector implementation. This makes Callisto much easier to use as vector and similar structures that will come in the future will make manual memory management not as much of an issue.

Links