March 8th 2020

Rendering 100k spheres, instantiating and draw calls

I knew I had to learn webGL when I saw Misaki's curl noise simulation. I'm not sure if it was the colors, the movement, or being able to zoom and rotate around the simulation. But I knew I wanted to make something like that. I just needed to figure out how...

Simulation who? A few days went by and I forgot all about it.

Welcome to my new blog, this sentence encapsulates what you'll be seeing here.

Two years after that, I managed to learn enough webGL to get by. And more recently, I stumbled upon Edan Kwan's beautiful Particle Love and loved it. But It also reminded me of that elusive curl noise simulation I wanted to make but didn't.

If I don't code it now, my honor is on the line. One part of me blames Edan for that.

So it's about time to tackle particle simulations in Webgl. And I'm bringing you down this rabbit hole with me:

Not an embed because I don't want to fry your device.

Curl Simulation Series

This is the first article in a series where I'm going to explain the main techniques needed to make a pretty and interactable curl simulation:

0.Instancing and draw calls
Rendering 100k objects is all about communication
1.GPUPU and FBO particle systems
Simulating life with only a GPU
2.Light, shadows, and extended ThreeJS materials
Custom shaders means more work

Each article is going to dive deep into their topic and then use it to build the curl simulation. If you aren't interested in the curl simulation itself or don't use ThreeJS, all the concepts are still going to be useful!

These are going to be intermediate to advanced level concepts. Best understood if you are comfortable with ThreeJS, Setting up the scene, what are shaders and GPUs.

If you don't like the series feel free to direct your hate/suggestions here.

Let's figure out the first step, how to render a 100k of spheres!

Using meshes to create 100k spheres

The straight forward way of rendering a lot of spheres is by creating a lot of meshes in a loop and adding them to the scene:

let geometry = new THREE.SphereBufferGeometry(1, 12, 12);
let colors = [0xfafafa, 0xff0000];
for (let i = 0; i < 5000; i++) {
  let color = colors[random.int(colors.length)];
  let material = new THREE.MeshBasicMaterial({color: color});
  
  let mesh = new THREE.Mesh(geometry, material);
  scene.add(mesh);
}

noteYou can find the complete code for the demo: https://github.com/Anemolo/100k-objects-with-Instanced-Geometries

Creating one mesh per sphere is simple, easy to manage and it does the trick most of the time. But try to add as many spheres as you can without dropping below 60fps in this demo:

You can find me Codesanbox

Depending on the device you are using you may get up to different amounts. In my electronic potato, I can barely make it over 7,000 spheres without dropping below 60fps. Using cubes instead of spheres helps, but only up to 13k cubes.

Still, that's a lot of spheres! But if you are considering doing something more than rendering 7k spheres. That number may have to go down even further.

If you take a look at the code for Misaki's Curl noise, Jaume's Polygon Shredder or Edan's Particle love. You'll notice that they all start at 16k instances. Some can go up to 4 million if your device is up to the challenge.

On top of that, they are running simulations, and even adding lights and shadows. Those are some impressing demos, but they make me a little self-conscious about ours...

Our one-mesh-per-sphere demo doesn't even reach half of that 16k instance mark. And adding a simulation or shadows is definitely out of our league. What's wrong with our code? Isn't everything I make perfect? :(

Too many draw calls

Our code has an issue of communication. On every render, we're telling the GPU to draw one sphere mesh, then another, then another, then 6,997 more times, one at the time. We're making way to many draw calls.

Draw calls are orders/commands from the CPU(javascript) to the GPU(GLSL) saying "hey, draw this sphere". When you make a draw call, your CPU has to prepare the render state. Things like allocating memory, binding buffers, and other processes.

Although the GPU renders pretty damn fast. The CPU is significantly slower at processing the draw call for the GPU. And while the CPU is communicating, the GPU has to stop all its work/rendering.

This leaves us in a situation where even though the GPU is blazing-fast at rendering. The slow and old CPU limits the GPU while it prepares and sends over the data. The CPU can't feed information to the GPU fast enough. This is what the cool kids call: CPU-Bottlenecking or CPU-bound.

The CPU Bottlenecking the GPU because of the ammount of small calculations

CPU Bottlenecking the GPU. Original https://www.wepc.com/tips/cpu-gpu-bottleneck/

note I didn't mention anything about the "amount of data". Of course, rendering 3 triangles is going to be faster than rendering 7,000 triangles, that's easy to figure out. But size isn't the core of the draw call issue.

What makes a draw call expensive, isn't the amount of data. What makes a draw call expensive is the preparation of the GPU commands, it's the CPU-to-GPU communication.

The process of transferring and preparing the draw information from the CPU to the GPU is expensive. Draw calls are going to have that overhead whether you have one triangle or thousand triangles.

Make fewer draw calls, get better performance.

Although It isn't the only way of getting better performance.

So, one big draw call is going to be significantly faster than 10 small draw calls of the same size. This is because the CPU only needs to communicate just once with the GPU. The render state doesn't need to change.

Compare it to developing 1 large website with 10 pages, versus 10 small websites of a single page. It's the same amount of pages/work, but setting up every new project and talking to a new client takes a bit of extra time. Making one big website it's going to be faster because you have less overhead and setup time.

This is also the case for deleting files on your computer!

An illustration of many small draw taking more time than one big draw call because of preparation time.

The amount of work is the same. The setup time adds up in the long run.

Reducing draw calls by Merging Geometries

As Renaud points in his optimization article, we can reduce draw calls is to merging all 7,000 geometries into a single geometry. Then, instead of having 7,000 small draw calls, we have 1 big draw call. Rendering is going to be a lot faster.

let baseGeometry = new THREE.SphereBufferGeometry(1, 12, 12);
let spheres = [];
let colors = [0xfafafa, 0xff0000];
for (let i = 0; i < 7000; i++) {
  let geometry = baseGeometry.clone();
  spheres.push(geometry)
}
let material = new THREE.MeshBasicMaterial();
var mergedSphereGeometries = THREE.BufferGeometryUtils.mergeBufferGeometries(spheres);
let mesh = new THREE.Mesh(mergedSphereGeometries, material);
scene.add(mesh);

The drawback of merging the geometries is that it only moves the processing time. Instead of having a slow render, we have a slow startup. Merging 7,000 geometries into a single geometry straight-up takes a lot of time. The more geometries you want to merge the more overhead your app is going to have.

And in our case specifically, all our spheres have the same geometry. We creating have 7,000 copies of the same SphereBufferGeometry and merging them. That's a lot of memory wasted on the same thing and also that's a lot of repeated data we're sending to the GPU.

Merging geometries is a great optimization technique. But not quite what we're looking for.

Reducing draw calls by Instancing Geometries

Instancing is a technique that allows us to send the geometry data to the GPU once. Then, using that same data, the GPU takes care of repeating drawing how many times we want.

This is perfect when you want to render the same geometry a lot of times. Which is what we're looking for.

It sending the data once, communicates once, and repeats the draw in the GPU. Letting the GPU go as fast as it's able to.

This means we're only making one draw call, there's no CPU overhead and the GPU takes care of repeating the draw blazing fast.

Ilustration of Instanced geometries helping with bottlenecking issue by batching draw calls.

Faster comunication by batching all the same drawing into a single draw call

The more work you can move to the GPU, and the fewer interruptions you make. The better performance your gonna get!

Creating an Instanced Geometry

To instantiate a geometry, we first need to create our baseGeometry. Then, create an empty InstancedBufferGeometry and .copy our geometry over.

Then, we tell instancedGeometry.maxInstanceCount how much instances we want to render:

let baseGeometry = new THREE.SphereBufferGeometry(3);
let instancedGeometry = new THREE.InstancedBufferGeometry().copy(baseGeometry);
let instanceCount = 7000;
instancedGeometry.maxInstancedCount = instanceCount;
let material = new THREE.ShaderMaterial();
let mesh = new THREE.Mesh(instancedGeometry, material);

You can find me in this CodeSandbox!

Okay, there's only one sphere there...

No worries, the demo isn't broken. All the 7,000 spheres are there, but they're rendered in the same position.

With regular meshes, we would change the Mesh.position to move each sphere. But now we have a single mesh with an instanced geometry.

Since the instances are repeated in the GPU, we also have to position them in the GPU. This means GLSL, the vertex shader.

We're going to move our positioning logic over to the vertex shader. But first, we need to give the shaders the data needed to calculate the position of each instance.

noteYou could also use the recently added InstancedMesh for this. Which is a bit easier to use.

Instanced Buffer Attributes

A Buffer Attribute is an array with a bunch of values describing properties of each vertex in a geometry.

Imagine a square geometry with 4 vertices, one at each corner. Each corner/vertex has it's own position, normal, and UVs values. Those are all properties of the vertices. Those are built-in attributes of our imaginary square geometry.

When the GPU is rendering each vertex. It's going to pass the corresponding attributes of each vertex to the shaders.

Buffer attributes in the vertex shader. One per vertex

Buffer attributes have one value(or vector) per vertex. The shader uses them to render

On the other hand, we have Instanced buffer attributes. Also an array with a bunch of values, but they are properties of each instance instead. All vertices of the same instance share the same instanced value.

When the GPU is rendering each instance. It's going to use the corresponding instanced value of each instance for all its vertices. Allowing us to render each instance slightly different.

How Instanced buffer attributes affect instanced

Creating Instanced Buffer Attributes

Creating an InstancedBufferAttribute is the same as creating regular bufferAttributes. But your geometry needs to be an InstancedBufferGeometry

Let's give each instance it's own color instanced Attribute:

0.Create an array with values for all instances. In our case, we'll get a random color and add the RGB to the array
1.Transform the array to a Float32Array
2.Create the InstancedBufferAttribute of and add it to our instanced geometry. We'll use the three RGB components of the color, so the buffer attribute is going to be size 3.

// 1. Create the values for each instance
let aColor = [];
let colors = [ new THREE.Color("#ff3030"), new THREE.Color("#121214")];
for (let i = 0; i < instanceCount; i++) {
  let color = colors[random.int(0, colors.length-1)];
  aColor.push(color.r, color.g, color.b);
}
// 2. Transform the array to float32
let aColorFloat32 = new Float32Array(aColor);
// 3. Create te instanced Buffer Attribute of size three
instancedGeometry.addAttribute("aColor", 
  new THREE.InstancedBufferAttribute(aColorFloat32, 3, false)
);

To calculate our sphere's curve position, we're going to need a few different values:

0.The X and Y Radius of the curve
1.The z curve offset to give it a bit of thickness instead of having it flat.
2.The progress of the sphere through the curve.
3.Travel speed of the sphere.

That's quite a few parameters! We could create one instancedBufferAttribute for each value. But each attribute is going to create a webGL call, that's 4 webGL calls in total. Just like draw calls, all other webGL calls are also CPU-to-GPU communication.

Instead, we're going to create a single InstancedBufferAttribute of vector size 4, aCurve and batch all our properties in there. Leaving us with only one webGL call!

Any time you can reduce the amount of JS code that gets executed and ESPECIALLY limit the number of calls JavaScript makes into the WebGL context, the better. @thrax

let aColor = [];
let aCurve = [];
let colors = [ new THREE.Color("#ff3030"), new THREE.Color("#121214")];
for (let i = 0; i < instanceCount; i++) {
  let radius = random.float(30, 40);
  let zOffset = random.float(-5, 5);
  let progress = random.float();
  let speed = random.float(0.02, 0.07);
  aCurve.push(radius, progress, zOffset, speed);
  
  let color = colors[random.int(0, colors.length-1)];
  aColor.push(color.r, color.g, color.b);
}
let aCurveFloat32 = new Float32Array(aCurve);
instancedGeometry.addAttribute(
  "aCurve",
  new THREE.InstancedBufferAttribute(aCurveFloat32, 4, false)
);
let aColorFloat32 = new Float32Array(aColor);
instancedGeometry.addAttribute(
  "aColor",
  new THREE.InstancedBufferAttribute(aColorFloat32, 3, false)
);

noteAdding the lowercase "a" to my variables helps me identify buffer attributes easier.

By themselves, these BufferAttributes only store data in the GPU. If the shaders don't make use of the attributes, they don't do anything. We need to add our custom positioning logic to the shaders and make use of the attributes.

Animating in the Shaders

Calculating the position animation in the shader is not only going to be needed to move our instances but it'll also be more performant.

When animating Mesh.position on every tick, the GPU updates the new transformation matrix to the GPU. Causing more a lot of CPU-to-GPU communication, and GPU-Bottlenecking our application.

If we give the GPU(shaders) the tools to calculate the animation. There's no need for such heavy communication, and our GPU can run at full speed. This is the principle for libraries like Three.Bas.

Let's go ahead and animate the instances in the shaders using our new aCurve attribute:

0.Define the aCurve attribute at the top
1.(optional) Extract values into separate variables to make easier to read
2.Calculate the CurvePosition and add it to the final position.

// 1. Define the attributes
attribute vec4 aCurve;
// Sphere positioning logic
vec3 getCurvePosition(float progress, float radius, float offset){
  
  vec3 pos = vec3(0.);
  pos.x += cos(progress *PI *8. ) * radius ;
  pos.y += sin(progress *PI*8.) * radius + sin(progress * PI *2.) * 30.;
  pos.z += progress *200. - 200./2. + offset;
  
  return pos;
}
void main(){
  vec3 transformed = position;
  
  // 2. Extract values from attribute
  float aRadius = aCurve.x;
  float aZOffset = aCurve.z;
  float aSpeed = aCurve.w;
  float aProgress = aCurve.y;
  
  // 3. Get position and add it to the final position
  vec3 curvePosition = getCurvePosition(aProgress, aRadius, aZOffset);
  transformed += curvePosition;
  
  gl_Position = projectionMatrix * modelViewMatrix * vec4(transformed, 1.);
}

Let's also add the aColor attribute to give color the sphere. Since the fragment shader can't read attributes. We'll need to send aColor from the vertex shader to the fragment shader using a varying vColor.

attribute vec3 aColor;
varying vec3 vColor;
void main(){
  // ...
  vColor = aColor;
}

Then, we can actually use it in the fragment shader:

varying vec3 vColor;
void main(){
  gl_FragColor = vec4(vColor, 1.);
}

Let's add these new shaders to our ShaderMaterial and see how it looks:

let material = new THREE.ShaderMaterial({
  fragmentShader: fragmentShader,
  vertexShader: vertexShader
})

noteIn this demo we're doing very simple things with our attributes. We're only calculating position and color. You can do pretty much anything you want with them. Like using attributes to give each instance different easings.

Let's fire up that demo again, but this time starting at 40k spheres.

Yet another demo with a Codesandbox!

Not bad at all! In my desktop, I can go up to 100k particles without dropping frames.

And now that the position logic is over in the shaders. We can control the animation with a handful of uniforms. Let's add uTime and make the spheres move through the curve in the vertex shader:

let material = new THREE.ShaderMaterial({
  fragmentShader: fragmentShader,
  vertexShader: vertexShader,
  uniforms: {
    uTime: new THREE.Uniform(0)
  }
})

uniform float uTime;
attribute vec4 aCurve;
attribute vec3 aColor;
varying vec3 vColor;
vec3 getCurvePosition(float progress, float radius, float offset){
	// ...
}
void main(){
  vec3 transformed = position;
  float aRadius = aCurve.x;
  float aZOffset = aCurve.z;
  float aSpeed = aCurve.w;
  float aProgress = mod(aCurve.y + uTime * aSpeed, 1.); // 
  
  vec3 curvePosition = getCurvePosition(aProgress, aRadius, aZOffset);
  transformed += curvePosition;
    
  gl_Position = projectionMatrix * modelViewMatrix * vec4(transformed, 1.);
  vColor = aColor;
}

The final Codesanbox of the article! And also the github.

And there we go! We're rendering 100k animated spheres! Our demo is done!

If you are interested in doing more complex animations. Using the things we learned in this article, I added some extra effects in the demo for you to explore.

Recap & Closing Thoughts

Let's go over what we learned through the article:

0.Creating 100k meshes has bad performance because it creates too many draw calls.
1.Draw calls are CPU-to-GPU communication. Drawing the objects is fast, but communicating is slow. Fewer draw calls, better performance.
2.Merging geometries into a single mesh reduces draw calls to one. But adds overhead in the CPU depending on the size of the geometries. Merging geometries takes time.
3.Instanced Geometries send the base geometry once, and the GPU takes care of drawing how many you want. Creating a single draw call, and minimal data. It only works if all instances have the same geometry.
4.InstancedBufferAttributes store data for each geometry instance. And are used in the shaders(GPU).
5.Animating in the shaders gives us better performance because we minimize CPU-to-GPU communication.

Instantiating and animating in the shaders is a bit cumbersome. Pretty different than just using meshes and animating the matrix. But it gives us a really good performance boost! From 7,000 spheres with meshes to 100k spheres with instanced geometries!

In the next article of the series, we're going to take advantage of this new-found performance to build a full-on curl simulation!

If you enjoyed the article or would like to nerd out about WebGL. Feel free to reach out on Twitter!

Next in series: Curl simulation & FBO (coming soon)

Further research

Note: Thanks to Matrin for helping me out with feedback. And to mario for keeping me sane while I write the articles<3

Me make article or project.
You get email or read it?

A newsletter about learning Web-creative!
I'll share anything interesting I read or write. WebGl, canvas, creative-coding etc