Galerkin Methods

Models can easily become so complex that they are impenetrable, unexaminable, and virtually unalterable.

— Donella Meadows (1980) "The unavoidable a priori" in: Randers J. ed., Elements of the system dynamics method. p. 27.

In our last post, we introduced the unsteady non-linear advection-diffusion-reaction equation in strong form,
$$
\mathcal{L}\left(u\right) \equiv \frac{\partial u}{\partial t} + \nabla\cdot\left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right] + S\left(u, \grad u\right) - f\left(\vec{x}, t\right),
$$
and weak form,
$$
\begin{eqnarray*}
\mathcal{R}\left(u, w\right) \equiv \int_{\Omega} \Bigg[-\left(\frac{\partial w}{\partial t}\right)^{T} u - \left(\nabla w\right)^{T} \cdot \left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right] + w^{T} S\left(u, \grad u\right) \\\ - w^{T} f\left(\vec{x}, t\right)\Bigg] dx + \int_{\partial\Omega} w^{T} \left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right]\cdot \vec{n} ds.
\end{eqnarray*}
$$

To find exact solutions of the weak form, we need to find $u \in \mathcal{W}$, such that
$$
\mathcal{R}\left(u, w\right) = 0,
$$
$\forall w \in \mathcal{W}$. In practice, $u$ only has a closed form expression for the simplest problems and $\mathcal{W}$ consists of infinitely many functions. So, finding $u$ exactly ranges from practically impossible to actually impossible for most problems that would turn up in the real world. Clearly we need some way to approximate the solution.

Galerkin Methods

The fundamental idea behind Galerkin methods is:

Rather than trying to find solutions from an infinitely large set of functions, we try to find the best approximation from a set of a finite number of functions.

Moving from an infinite number of functions to a finite number to approximate the solution is called discretization. We often call these kinds of approximate solutions discrete.

Let's say we have a set of $N$ functions that we're going to use to approximate the solution. That is, we'll try to approximate the solution by summing together weighted combinations of those $N$ functions. We're going to call that set of function $\mathcal{U}_{N}$ and all of the function that are in $\mathcal{U}_{N}$ are functions that are also in $\mathcal{W}$ (or $\mathcal{U}_{N} \subset \mathcal{W}$ in mathematical notation). If we're only using $N$ functions to approximate $u$, then we only need to use $N$ functions to test against (i.e. only $N$ functions for $w$). So, we can describe finding an approximation to the solution as, find $u_{N} \in \mathcal{U}_{N}$, such that
$$
\mathcal{R}\left(u_{N}, w_{N}\right) = 0,
$$
$\forall w_{N} \in \mathcal{W}_{N}$. Here, we've introduced $\mathcal{W}_{N}$ as a short hand for the set of $N$ functions we're testing against.

This is still fairly abstract, and depending on how we choose $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$ we can derive a wide range of different methods with different properties. We can choose whether to use the same set of functions for $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$. These two broad classes of method are called Bubnov-Galerkin methods if $\mathcal{U}_{N} = \mathcal{W}_{N}$, and Petrov-Galerkin methods if $\mathcal{U}_{N} \neq \mathcal{W}_{N}$. After that, we still need to decide what functions we have in $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$: polynomials? sines and cosines? NURBS? Are they going to be defined over the entire domain, or are they only going to be non-zero over a small portion of it? Since there are so many variations, we're going to focus in on two options that are particularly relevant to the work Cory and I do:

the continuous Galerkin (CG) method
the discontinuous Galerkin (DG) method

Preliminary: The Mesh

In general, when we have a PDE to solve, it comes with a domain ($\Omega$) — the region of the world where we're interested in obtaining a solution. Let's take the domain, and break it up into smaller sub-domains (we'll call them elements), which do not overlap, and which cover the entire domain. We call this collection of elements a mesh, and give it the symbol $\mathcal{T}_{h}$. More precisely, we're using $\mathcal{T}_{h}$ to describe the set of all the elements in the mesh. We use $h$ in the subscript to describe the fineness of the mesh, i.e. how small the elements are in some sense. If every element is the same size, $h$ could be that size, but the specifics don’t matter too much. The subscript is mostly useful to discern between two different meshes, with generally larger or smaller elements. We will also be using $\partial\mathcal{T}_{h}$ to describe the set of unique element boundaries in the mesh.

It is clear that the quality of an approximate solution is a function of the mesh that is used in constructing that approximation.

The Continuous Galerkin Method

The CG method is well suited to diffusive problems, such as,
$$
-\nabla \cdot \left(\kappa \nabla u\right) = f\left(\vec{x}\right).
$$
You can use CG for problems with advection too, but you need to do some extra work to stabilize the solutions — without stabilization, you can get some really bad solutions. Stabilized-CG methods are a little more advanced than we're aiming for with this series, so I'll leave them as something I may cover in a future post.

We're going to use our mesh $\mathcal{T}_{h}$ to define $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$. Firstly, we're going to use the same set of functions for both: $\mathcal{U}_{N} = \mathcal{W}_{N}$. Now, we're going to say that our approximate solutions should be constructed from piecewise continuous functions that are polynomials up to order $p$. We can describe this mathematically as,
$$
\mathcal{W}^{CG}_{h,p} \equiv \{ w \in \mathcal{C}^{0}\left(\Omega\right) : w|_{k} \in \mathbb{P}^{p}\left(k\right), \forall k \in \mathcal{T}_{h} \}.
$$
Reading from right-to-left, we're taking every element from our mesh ($\forall k \in \mathcal{T}_{h}$), and specifying that a function, $w$, needs to be a polynomial (of order $p$) on that element ($w|_{k} \in \mathbb{P}^{p}\left(k\right)$), and must be continuous over the entire domain ($w \in \mathcal{C}^{0}\left(\Omega\right)$). Then $\mathcal{W}^{CG}_{h,p}$ is just the set of functions. Note that rather than using a subscript $N$ for the size of the set, we're using subscripts $h$ and $p$. This is because we've defined $\mathcal{W}^{CG}_{h,p}$ relative to a mesh ($\mathcal{T}_{h}$, which gives us the $h$), and polynomial order $p$.

Now, if we take our diffusive problem from above (in strong form), multiply it by some function ($w_{h,p}$) from $\mathcal{W}^{CG}_{h,p}$, and do integration by parts, we end up with the following,
$$
\mathcal{R}^{CG}_{h,p}\left(u, w_{h,p}\right) = \sum_{k \in \mathcal{T}_{h}} \int_{k} \Bigg[\left(\nabla w_{h,p}\right)^{T} \cdot \kappa\left(u\right)\nabla u - w_{h,p}^{T} f\left(\vec{x}, t\right) dx \Bigg] - \sum_{f \in \partial\Omega} \int_{f} w_{h,p}^{T} \kappa\left(u\right)\nabla u \cdot \vec{n} ds.
$$
Now, to obtain solutions using the CG method, we seek $u_{h,p} \in \mathcal{W}^{CG}_{h,p}$ such that,
$$
\mathcal{R}^{CG}_{h,p}\left(u_{h,p}, w_{h,p}\right) = 0,
$$
$\forall w_{h,p} \in \mathcal{W}^{CG}_{h,p}$.

The CG method results in solutions that are continuous in value and up to polynomial order $p$.

The Discontinuous Galerkin Method

The DG method is well suited to advective problems, such as,
$$
\nabla \cdot \left(\vec{a} u\right) = f\left(\vec{x}\right).
$$
Again, it is possible to apply DG to problems with diffusion. But, like with CG, you need to do some extra work thats a bit beyond what we want for this series. I may do a post on how to do diffusive problems with DG in the future, but for now its worth noting that a diffusive term requires special care.

Like with CG, we're going to use our mesh $\mathcal{T}_{h}$ to define $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$, and we choose that $\mathcal{U}_{N} = \mathcal{W}_{N}$. Now, we're going to say that our approximate solutions should be constructed from functions that are polynomials up to order $p$ within each element, but can be discontinuous between elements. We can describe this mathematically as,
$$
\mathcal{W}^{DG}_{h,p} \equiv \{ w \in \mathcal{L}^{2}\left(\Omega\right) : w|_{k} \in \mathbb{P}^{p}\left(k\right), \forall k \in \mathcal{T}_{h} \}.
$$
Reading from right-to-left, we're taking every element from our mesh ($\forall k \in \mathcal{T}_{h}$), and specifying that a function, $w$, needs to be a polynomial (of order $p$) on that element ($w|_{k} \in \mathbb{P}^{p}\left(k\right)$), and must be square-integrable over the entire domain ($w \in \mathcal{L}^{2}\left(\Omega\right)$). Then $\mathcal{W}^{DG}_{h,p}$ is just the set of functions.

Let's take our advective problem from above (in strong form), and weight it by a function ($w_{h,p}$) from $\mathcal{W}^{DG}_{h,p}$, and integrate,
$$
\int_{\Omega} w_{h,p} \nabla \cdot \left(\vec{a} u\right) dx = \int_{\Omega} w_{h,p} f\left(\vec{x}\right) dx.
$$
We must be careful when we apply the integration by parts, since we're now integrating something that has jumps ($w_{h,p}$ is discontinuous between elements). So, we need to account for those jumps in the integral by including integrals along the elements boundaries too. After some algrabra, we get the following,
$$
\sum_{k \in \mathcal{T}_{h}} \Bigg[ \int_{k} -\left(\nabla w_{h,p}\right)^{T} \cdot \vec{a} u - w_{h,p}^{T} f\left(\vec{x}, t\right) dx + \sum_{f \in k} \int_{f} w_{h,p}^{T} \vec{a} u \cdot \vec{n} ds \Bigg] + \sum_{f \in \partial\Omega} \int_{f} w_{h,p}^{T} \vec{a} u \cdot \vec{n} ds = 0.
$$
At this point, its worth noting that we have multiple definitions of $u$ at element boundaries, which is a consequence of allowing it to be discontinuous between elements. This can cause issues if we're not careful. To account for this, we're going to define some kind of characteristic average state $u^{* }$ based on the states either side of the boundary, and use that instead. While we could define $u^{* }$ explicitly for this problem, it becomes difficult to find an explicit expression for $u^{* }$ for more complex problems (e.g. when $\vec{F}$ is a non-linear function of $u$). Instead, we're going to define $u^{* }$ such that,
$$
\vec{F}\left(u^{* }\right) = H\left(u^{+}, u^{-}, \vec{n}\right),
$$
where $u^{+}$ and $u^{-}$ are the values of $u$ either side of the element boundary ($\vec{n}$ points from the $+$ to the $-$ side), and $H$ is a numerical flux function. This definition seems pretty abstract (because it is), but it has the advantage that there is a lot of literature describing how to construct the numerical flux function for a range of problems. For our problem, we can define $H$ as,
$$
H\left(u^{+}, u^{-}, \vec{n}\right) = \frac{1}{2}\left|\vec{a}\cdot\vec{n}\right| \left(u^{+} + u^{-}\right) - \frac{1}{2}\left|\vec{a}\cdot\vec{n}\right| \left(u^{+} - u^{-}\right).
$$
Bringing this altogether, we obtain,
$$
\begin{eqnarray*}
\mathcal{R}^{DG}_{h,p}\left(u, w_{h,p}\right) = \sum_{k \in \mathcal{T}_{h}} \int_{k} -\left(\nabla w_{h,p}\right)^{T} \cdot \vec{a} u - w_{h,p}^{T} f\left(\vec{x}, t\right) dx + \sum_{f \in \partial\mathcal{T}_{h}} \int_{f} \left(w^{+}_{h,p} - w^{-}_{h,p}\right)^{T} H\left(u^{+}, u^{-}, \vec{n}\right) ds \\\ + \sum_{f \in \partial\Omega} \int_{f} w_{h,p}^{T} \vec{a} u \cdot \vec{n} ds.
\end{eqnarray*}
$$

Finally, to obtain solutions using the DG method, we seek $u_{h,p} \in \mathcal{W}^{DG}_{h,p}$ such that,
$$
\mathcal{R}^{DG}_{h,p}\left(u_{h,p}, w_{h,p}\right) = 0,
$$
$\forall w_{h,p} \in \mathcal{W}^{DG}_{h,p}$.

The DG method results in solutions that are discontinuous across element boundaries and up to polynomial order $p$.

The Two Temporal Discretizations

So far, I have skipped over how the Galerkin method can be applied to unsteady problems. Broadly speaking, there are two ways we can include the effect of a time derivative.

The first way we can include a time derivative is based on the following weak form form the time term,
$$
\int_{\Omega} -\left(\frac{\partial w}{\partial t}\right)^{T} u dx.
$$
In this case, we can extend our domain into the time-dimension. We would the have elements which represent spatial and temporal dimensions, and a solution $u_{h,p}\left(\vec{x},t\right)$ which is defined on that mesh in space and time. These kinds of methods are called space-time methods. To use this method we have to define $w_{h,p}\left(\vec{x},t\right)$ i.e. in space and time, and we obtain a solution in space and time at once.

The second method is based on using the following term,
$$
\int_{\Omega} w^{T}\left(\frac{\partial u}{\partial t}\right) dx.
$$
We could now approximate the time derivative as,
$$
\frac{\partial u}{\partial t} \approx \frac{u\left(\vec{x}, t + \Delta t\right) - u\left(\vec{x}, t\right)}{\Delta t}
$$
If we have the value of $u\left(\vec{x}, 0\right)$, we can find $u\left(\vec{x}, \Delta t\right)$, and then $u\left(\vec{x}, 2\Delta t\right)$ etc... To use this method we only have to define $w_{h,p}\left(\vec{x}\right)$ i.e. just in space, and we obtain a solutions at increasing times as the method progresses. This is not the only way we could approximate the time derivative using solutions at previous times. There is a large amount of literature on this for solving ordinary differential equations.

Closing Remarks

Rather than trying to find a solution to the weak form, we can find approximate solutions which are constructed from a finite number of possible functions that we choose a priori. The choice of the functions we use for approximate solutions has a significant effect on the quality of the resulting solution. Two common methods are:

The continuous Galerkin method, where we construct the solution using piecewise polynomial functions that are continuous across element boundaries.
The discontinuous Galerkin method, where we construct the solution using piecewise polynomial functions that are discontinuous across element boundaries.

The size, shape, and distribution of elements in a mesh can have a significant impact on the quality of a solution, and any quantities which are derived from it. The crux of output adaptive simulation is automatically tailoring the mesh (without user input) to obtain the highest quality outputs from a solution.

Have a topic you want a deeper dive on? Have questions? Have comments? Please reach out to me on LinkedIn or Twitter!