Hello everyone! It's been a while since I last posted.

I was recently struck by inspiration and made this Gouraud shading routine.

At its simplest level, this is a modification of a texture mapping routine I wrote a while's back, so I've decided to try and implement a whole libraries worth of 3D primitive routines.

Here's a short list of things I plan to write and/or rework:
1. Rework functions to be passed a pointer to a struct describing the triangle and its parameters.
(general struct that's data members are interpreted differently by each routine)

2.Rework texture mapping to work from a predefined texture page(256 x whatever aligned area of memory in the $D0xxxx range) so arbitrary sized textures can be drawn.

3. Implement some level of perspective correct drawing. At the could mean:
a. Raycaster style 'walls' with repeating textures (constant z per x line)
b. Raycaster style 'floors' ( constant z per y line ) (maybe using a lookup table for each scanline? )

I don't think generic perspective correction is very viable for actual usage, but who knows at this point Razz

Well that's quite interesting ! I'll sure take a look at this marvelous assembly code Razz

From my experience, locking the texture to an aligned 64K make thing a whole lot easier and faster, although I never really got around custom-size texture repeat, so I am curious of what you'll come up with.

Keep the good work Razz
Shading seems to be quite interesting. I myself have been looking into a gradient coloring and this is the perfect example.

Keep us updated Smile
Spent some time and implemented texture paging and such
What's changed exactly:

Stole the first 64 kb of VRAM

D40000 - D47FFF = Draw buffer(256x128, of which 160x120 is drawn atm)
D48000 - D4FFFF = Texture page(256x128)
D50000 - D52BFF = reserved for later
This way the C heap is left untouched, but any graphx draws have to be to the second buffer.

Also made it so the rasterizer is passed a struct describing the triangle, rather than the elements directly. Right now this is all pretty jank, so there's some cleaning up to do for sure.

Been thinking about what I'm planing on using this system for, and decided that forgoing perspective correction in favor of a lighting system and higher poly count would be the right call.

Speaking of lighting, maybe a linear lighting system like this could work well?

l= m/(L dot N)

where L is the vector from the light to the vertex(unnormalized), N is the vertex normal, and little m is the magnitude of the light. I don;t know much about lighting systems, so any feedback would be nice. Smile

Also, if anybody knows any ways I might optimize my functions some, that would be greatly appreciated. Razz
Zaalane wrote:

Been thinking about what I'm planing on using this system for, and decided that forgoing perspective correction in favor of a lighting system and higher poly count would be the right call.


I feel this is the right way Smile

For lightning, I usually use clamp(dot(vertex_normal * light_normal),0,1)*exponent + ambient which give a nice infinite directionnal light. You can of course remplace vertex_normal by the triangle normal. It is better to have both normal normalized, but that can be done beforehand in model conversion.

Point lightning isn't much more complicated but quite more heavy on ressource :

light = (clamp(dot(vertex_normal * light_normal),0,1)*exponent) / length(light_position-vertex_position)
You can lso restrict the "beam" of light by checking the dot product against the cos of the beam angle and clamping appropriately (or readjusting the light range within the beam).

Quote:
Also, if anybody knows any ways I might optimize my functions some, that would be greatly appreciated

I'll take a quick look but I promise nothing Razz
Another quick post!

Did some optimizations to my texturing routine to avoid using stack space, but the big change was moving all the shading code to Cursor RAM ( E30800 - E30BFF)

E30800 -> reserved by Graphx library
E30880 -> Gouraud Shader ( 0x3C bytes)
E308C0 -> Textured Shader ( 0x9E bytes)

this GIF shows the general performance. (Left number is rasterization time, right is shading)

Forked and starred. Awesome work so far! Smile

I've made a couple commits so far.

- Stored the stack in fast RAM
- Optimized fast RAM usage
- Use "virtual at address" to more clearly assemble the relocated code

Storing the stack in fast RAM speeds up the gouraud rendering about 112% and the textured rendering about 126%.
After moving a few things around there are 416 bytes of remaining space. More if you designate less for the stack.
Assuming the first 128 bytes of fast RAM aren't used, there's also that.

fork repo: https://github.com/beckadamtheinventor/gouraud
Wow, that's way cleaner! Smile Made a pull request and merged these changes into the master. I really need to start digging into the Fasmg manual.

Been rewriting some stuff to make use of a reciprocal LUT. Will make a commit soon.
Another update! Smile

Changed shaders so they draw 4 pixels per loop, with an extra loop for leftover pixels. Deltas doubled in loop for less texel accesses(at cost of some graphical fidelity). For the quick tests I performed, this cuts the cycles per pixel down to around ~28 for textured triangles and ~15 for gouraud shaded.

A quick aside, the function at $E30800 is used in the graphx function gfx_ZeroScreen() to quickly blank the screen. This space will probably be claimed later, seeing as this engine is planned only for use with C programs using a blitting/partial rewrite scheme(rather then double buffering).
Very good ! My personal best in 29,5 cycles per pixel but filtering at full rate instead of half rate. Looking at your inner code, I think we can apply kinda the same optimization I applied in mine :


Code:
 
ld    d, iyh
ld    e, ixh
ld    a, (de)
ld    (hl), a
inc   l
ld    (hl),a
inc   l
exx
add  ix,sp
add  iy,de
exx


There is quite few cycles spend on copying the 16 bits pair to de (6 cycles per load), and ix/iy are also slower on addition than hl

If we rearrange the pair a bit, we can put good use to hl:hl' pair :

Code:

add   hl, de
ld   a, h
exx
adc   hl, sp
ld   h, a
ld   a, (hl)
exx
ld   (de), a
inc   de
ld     (de), a
inc   de


We can consider hl:hl' as a global aggregation of 16:16:8:8 bits, with 16 bits lower being v, 16 bits upper being u, 8 bits middle undefined (and actually copied), and the upper 8 bits being the texture page
That way, we only have to take care of copying v register and then sample the texture.

Of course, you need to take care of possibility of a carry going into u if v+dv overflow (but that only occurs if dv is negative or v overflow texture boundarie, so not a big deal), and also you need to construct the pair beforehand.

That clock at around 20,5 cycles per pixel at half filtering rate, 33 cycles at full in the $E30800 area.

Also concerning the palette gradient, how are they defined ? I see the gouraud code doing a simple ramp, but that mean we are limited to specific gradient ?
Wow, that's a neat way of doing that! So de would be the dv/dx and the lower 8 bits of du/dx, and sp would be the upper 8 bits of du/dx, right?
The gouraud gradient is simply linear through the pallete, correct. Each gradient has to be predefined in the palette. I could define a version that references a custom ramp somewhere, could allow for colored lights or something similar.
Exactly Smile

Ah I see for the gradient. Maybe a R2G2B2I2 palette could be interesting ?
That would be cool! I was thinking of writing a version that did a color mask over an existing polygon,which might allow for shading textures and some transparency.With these faster fill rates that might be manageable.Smile First draw an I=0 version of the texture, then add the gouraud values on top in a second pass ( or alternatively, make use of the remaining free index registers).

For instance, an intensity only mask allow for shading, while an RGB only would mean a color mask over that area of the screen, denoting a 'portal' of some kind.
I think it would be better to do everything in one pass, setup is still quite coslty Smile
A simple compute in the pixel shading is better I think than reading and reinterpolating value. Of course, you have a more wide range of possibilities with two pass.

What kind of performance do you expect / want ? Smile
Implemented the optimization TheMachine02 mentioned. Also did some optimizing to the Gouraud routine. Very Happy

I'm not quite sure what I want in terms of overall performance.After this I'm going to begin actually implementing the 3D part of this. Was thinking of using a sort of portal system for rendering. What do you think would be a good maximum vertex/face count per scene? I think designing it so there's a max of ~50 triangles on screen at any one time would be good(at an average of 10 triangles per 1/60th of a second, that's around ~10 fps average)

Well, 10 fps target is definitely good !

And 50 triangles full screen would be reasonable, altough a bit a on the low side, but hey, performance first. You need to consider about, 1,2 to 1,5 vertex per triangle (with a nicely optimized mesh, it is about that). Vertex shouldn't be the limiting factor though, but really the triangle rendering.

Doing portal rendering, backface culling and frustrum culling give always big win Smile

On side note, do you target full screen room or just object / starfiled rendering ? Optimisation of these two case can give quite different results, and you also need to take care of clipping performance in case of room rendering.
Btw, do your clipping in 3D, it makes the 2D rendering much much easier Smile

EDIT : a good test model is the Suzan model of blender, 507 vertex, 968 triangles, I found it a good stress test Razz
Can you tell me why it's better to do clipping in 3D? I'm not very familiar with clipping methods Razz

I was planning on using special case code to clip the edges during rasterization, using outcodes like in Cohen-Sutherland. Then I wanted to clip those edges with a coverage buffer ( like in Quake ). This would eliminate overdraw, but add overhead from the clipping + multiple lines per y. I want to do a room based portal system, and cbuffers lend well to this (occlude area outside portal polygon by setting having the only clear area be inside the polygon.)
Well, clipping in 3D have two major advantage :
1 - you don't need to care about overflow when perspective dividing your point (since only in frustrum perspective project should happen)
2 - every clipping in 2D is removed from the inner loop of raster, and you can limit your drawing to the screen range which allow a few optimization.

You can also apply portal in 3D btw, that doesn't change much.
The drawback is that the clipping is a tad more complex and heavier on calculation side, but with clever hack it can be kinda offseted (outcode are one among the other, as it can (and should!) be applied in 3D).

Coverage buffer is neat, but I never succeded in designing a c-buffer which fit in RAM Razz A stencil buffer approach could also be doable, but then it is the compute which become harder - occlusion is always a tricky stuff Smile
I've taken some time to think about how to organize the graphics, and have been looking at examples from older 3d games Smile

I think a 'tile' or maybe object based system would be the best way to both have variety in the scene and also not having to store individual polygons for every area Razz

I think a sector outta have the following elements:
▪ A list of objects/tiles in the sector
▪ A list of portals to other sectors
▪ Descriptors for the lighting of the sector
▪ A localized palette for colored lighting(optionally computed at runtime)
+ other program dependent data maybe

I think limiting the base palette to using RGB222, and setting it up so only 4 unique sector palettes can be used at any one is honestly fine.

Designing is a lot of fun since I get to spend way more time laying in bed thinking until I fall asleep rather than sitting at my computer at staring at Notepad++ all night Razz
Zaalane wrote:
I've taken some time to think about how to organize the graphics, and have been looking at examples from older 3d games Smile

I think a 'tile' or maybe object based system would be the best way to both have variety in the scene and also not having to store individual polygons for every area Razz

I think a sector outta have the following elements:
▪ A list of objects/tiles in the sector
▪ A list of portals to other sectors
▪ Descriptors for the lighting of the sector
▪ A localized palette for colored lighting(optionally computed at runtime)
+ other program dependent data maybe

I think limiting the base palette to using RGB222, and setting it up so only 4 unique sector palettes can be used at any one is honestly fine.

Designing is a lot of fun since I get to spend way more time laying in bed thinking until I fall asleep rather than sitting at my computer at staring at Notepad++ all night Razz


Here's a nice YouTube video that might give you some ideas on implementation.
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement