Ben Couchman

Galerkin Methods

Ben Couchman — Fri, 19 Feb 2021 19:50:31 GMT

Models can easily become so complex that they are impenetrable, unexaminable, and virtually unalterable.

— Donella Meadows (1980) "The unavoidable a priori" in: Randers J. ed., Elements of the system dynamics method. p. 27.

In our last post, we introduced the unsteady non-linear advection-diffusion-reaction equation in strong form,
$$
\mathcal{L}\left(u\right) \equiv \frac{\partial u}{\partial t} + \nabla\cdot\left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right] + S\left(u, \grad u\right) - f\left(\vec{x}, t\right),
$$
and weak form,
$$
\begin{eqnarray*}
\mathcal{R}\left(u, w\right) \equiv \int_{\Omega} \Bigg[-\left(\frac{\partial w}{\partial t}\right)^{T} u - \left(\nabla w\right)^{T} \cdot \left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right] + w^{T} S\left(u, \grad u\right) \\\ - w^{T} f\left(\vec{x}, t\right)\Bigg] dx + \int_{\partial\Omega} w^{T} \left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right]\cdot \vec{n} ds.
\end{eqnarray*}
$$

To find exact solutions of the weak form, we need to find $u \in \mathcal{W}$, such that
$$
\mathcal{R}\left(u, w\right) = 0,
$$
$\forall w \in \mathcal{W}$. In practice, $u$ only has a closed form expression for the simplest problems and $\mathcal{W}$ consists of infinitely many functions. So, finding $u$ exactly ranges from practically impossible to actually impossible for most problems that would turn up in the real world. Clearly we need some way to approximate the solution.

Galerkin Methods

The fundamental idea behind Galerkin methods is:

Rather than trying to find solutions from an infinitely large set of functions, we try to find the best approximation from a set of a finite number of functions.

Moving from an infinite number of functions to a finite number to approximate the solution is called discretization. We often call these kinds of approximate solutions discrete.

Let's say we have a set of $N$ functions that we're going to use to approximate the solution. That is, we'll try to approximate the solution by summing together weighted combinations of those $N$ functions. We're going to call that set of function $\mathcal{U}_{N}$ and all of the function that are in $\mathcal{U}_{N}$ are functions that are also in $\mathcal{W}$ (or $\mathcal{U}_{N} \subset \mathcal{W}$ in mathematical notation). If we're only using $N$ functions to approximate $u$, then we only need to use $N$ functions to test against (i.e. only $N$ functions for $w$). So, we can describe finding an approximation to the solution as, find $u_{N} \in \mathcal{U}_{N}$, such that
$$
\mathcal{R}\left(u_{N}, w_{N}\right) = 0,
$$
$\forall w_{N} \in \mathcal{W}_{N}$. Here, we've introduced $\mathcal{W}_{N}$ as a short hand for the set of $N$ functions we're testing against.

This is still fairly abstract, and depending on how we choose $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$ we can derive a wide range of different methods with different properties. We can choose whether to use the same set of functions for $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$. These two broad classes of method are called Bubnov-Galerkin methods if $\mathcal{U}_{N} = \mathcal{W}_{N}$, and Petrov-Galerkin methods if $\mathcal{U}_{N} \neq \mathcal{W}_{N}$. After that, we still need to decide what functions we have in $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$: polynomials? sines and cosines? NURBS? Are they going to be defined over the entire domain, or are they only going to be non-zero over a small portion of it? Since there are so many variations, we're going to focus in on two options that are particularly relevant to the work Cory and I do:

the continuous Galerkin (CG) method
the discontinuous Galerkin (DG) method

Preliminary: The Mesh

In general, when we have a PDE to solve, it comes with a domain ($\Omega$) — the region of the world where we're interested in obtaining a solution. Let's take the domain, and break it up into smaller sub-domains (we'll call them elements), which do not overlap, and which cover the entire domain. We call this collection of elements a mesh, and give it the symbol $\mathcal{T}_{h}$. More precisely, we're using $\mathcal{T}_{h}$ to describe the set of all the elements in the mesh. We use $h$ in the subscript to describe the fineness of the mesh, i.e. how small the elements are in some sense. If every element is the same size, $h$ could be that size, but the specifics don’t matter too much. The subscript is mostly useful to discern between two different meshes, with generally larger or smaller elements. We will also be using $\partial\mathcal{T}_{h}$ to describe the set of unique element boundaries in the mesh.

It is clear that the quality of an approximate solution is a function of the mesh that is used in constructing that approximation.

The Continuous Galerkin Method

The CG method is well suited to diffusive problems, such as,
$$
-\nabla \cdot \left(\kappa \nabla u\right) = f\left(\vec{x}\right).
$$
You can use CG for problems with advection too, but you need to do some extra work to stabilize the solutions — without stabilization, you can get some really bad solutions. Stabilized-CG methods are a little more advanced than we're aiming for with this series, so I'll leave them as something I may cover in a future post.

We're going to use our mesh $\mathcal{T}_{h}$ to define $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$. Firstly, we're going to use the same set of functions for both: $\mathcal{U}_{N} = \mathcal{W}_{N}$. Now, we're going to say that our approximate solutions should be constructed from piecewise continuous functions that are polynomials up to order $p$. We can describe this mathematically as,
$$
\mathcal{W}^{CG}_{h,p} \equiv \{ w \in \mathcal{C}^{0}\left(\Omega\right) : w|_{k} \in \mathbb{P}^{p}\left(k\right), \forall k \in \mathcal{T}_{h} \}.
$$
Reading from right-to-left, we're taking every element from our mesh ($\forall k \in \mathcal{T}_{h}$), and specifying that a function, $w$, needs to be a polynomial (of order $p$) on that element ($w|_{k} \in \mathbb{P}^{p}\left(k\right)$), and must be continuous over the entire domain ($w \in \mathcal{C}^{0}\left(\Omega\right)$). Then $\mathcal{W}^{CG}_{h,p}$ is just the set of functions. Note that rather than using a subscript $N$ for the size of the set, we're using subscripts $h$ and $p$. This is because we've defined $\mathcal{W}^{CG}_{h,p}$ relative to a mesh ($\mathcal{T}_{h}$, which gives us the $h$), and polynomial order $p$.

Now, if we take our diffusive problem from above (in strong form), multiply it by some function ($w_{h,p}$) from $\mathcal{W}^{CG}_{h,p}$, and do integration by parts, we end up with the following,
$$
\mathcal{R}^{CG}_{h,p}\left(u, w_{h,p}\right) = \sum_{k \in \mathcal{T}_{h}} \int_{k} \Bigg[\left(\nabla w_{h,p}\right)^{T} \cdot \kappa\left(u\right)\nabla u - w_{h,p}^{T} f\left(\vec{x}, t\right) dx \Bigg] - \sum_{f \in \partial\Omega} \int_{f} w_{h,p}^{T} \kappa\left(u\right)\nabla u \cdot \vec{n} ds.
$$
Now, to obtain solutions using the CG method, we seek $u_{h,p} \in \mathcal{W}^{CG}_{h,p}$ such that,
$$
\mathcal{R}^{CG}_{h,p}\left(u_{h,p}, w_{h,p}\right) = 0,
$$
$\forall w_{h,p} \in \mathcal{W}^{CG}_{h,p}$.

The CG method results in solutions that are continuous in value and up to polynomial order $p$.

The Discontinuous Galerkin Method

The DG method is well suited to advective problems, such as,
$$
\nabla \cdot \left(\vec{a} u\right) = f\left(\vec{x}\right).
$$
Again, it is possible to apply DG to problems with diffusion. But, like with CG, you need to do some extra work thats a bit beyond what we want for this series. I may do a post on how to do diffusive problems with DG in the future, but for now its worth noting that a diffusive term requires special care.

Like with CG, we're going to use our mesh $\mathcal{T}_{h}$ to define $\mathcal{U}_{N}$ and $\mathcal{W}_{N}$, and we choose that $\mathcal{U}_{N} = \mathcal{W}_{N}$. Now, we're going to say that our approximate solutions should be constructed from functions that are polynomials up to order $p$ within each element, but can be discontinuous between elements. We can describe this mathematically as,
$$
\mathcal{W}^{DG}_{h,p} \equiv \{ w \in \mathcal{L}^{2}\left(\Omega\right) : w|_{k} \in \mathbb{P}^{p}\left(k\right), \forall k \in \mathcal{T}_{h} \}.
$$
Reading from right-to-left, we're taking every element from our mesh ($\forall k \in \mathcal{T}_{h}$), and specifying that a function, $w$, needs to be a polynomial (of order $p$) on that element ($w|_{k} \in \mathbb{P}^{p}\left(k\right)$), and must be square-integrable over the entire domain ($w \in \mathcal{L}^{2}\left(\Omega\right)$). Then $\mathcal{W}^{DG}_{h,p}$ is just the set of functions.

Let's take our advective problem from above (in strong form), and weight it by a function ($w_{h,p}$) from $\mathcal{W}^{DG}_{h,p}$, and integrate,
$$
\int_{\Omega} w_{h,p} \nabla \cdot \left(\vec{a} u\right) dx = \int_{\Omega} w_{h,p} f\left(\vec{x}\right) dx.
$$
We must be careful when we apply the integration by parts, since we're now integrating something that has jumps ($w_{h,p}$ is discontinuous between elements). So, we need to account for those jumps in the integral by including integrals along the elements boundaries too. After some algrabra, we get the following,
$$
\sum_{k \in \mathcal{T}_{h}} \Bigg[ \int_{k} -\left(\nabla w_{h,p}\right)^{T} \cdot \vec{a} u - w_{h,p}^{T} f\left(\vec{x}, t\right) dx + \sum_{f \in k} \int_{f} w_{h,p}^{T} \vec{a} u \cdot \vec{n} ds \Bigg] + \sum_{f \in \partial\Omega} \int_{f} w_{h,p}^{T} \vec{a} u \cdot \vec{n} ds = 0.
$$
At this point, its worth noting that we have multiple definitions of $u$ at element boundaries, which is a consequence of allowing it to be discontinuous between elements. This can cause issues if we're not careful. To account for this, we're going to define some kind of characteristic average state $u^{* }$ based on the states either side of the boundary, and use that instead. While we could define $u^{* }$ explicitly for this problem, it becomes difficult to find an explicit expression for $u^{* }$ for more complex problems (e.g. when $\vec{F}$ is a non-linear function of $u$). Instead, we're going to define $u^{* }$ such that,
$$
\vec{F}\left(u^{* }\right) = H\left(u^{+}, u^{-}, \vec{n}\right),
$$
where $u^{+}$ and $u^{-}$ are the values of $u$ either side of the element boundary ($\vec{n}$ points from the $+$ to the $-$ side), and $H$ is a numerical flux function. This definition seems pretty abstract (because it is), but it has the advantage that there is a lot of literature describing how to construct the numerical flux function for a range of problems. For our problem, we can define $H$ as,
$$
H\left(u^{+}, u^{-}, \vec{n}\right) = \frac{1}{2}\left|\vec{a}\cdot\vec{n}\right| \left(u^{+} + u^{-}\right) - \frac{1}{2}\left|\vec{a}\cdot\vec{n}\right| \left(u^{+} - u^{-}\right).
$$
Bringing this altogether, we obtain,
$$
\begin{eqnarray*}
\mathcal{R}^{DG}_{h,p}\left(u, w_{h,p}\right) = \sum_{k \in \mathcal{T}_{h}} \int_{k} -\left(\nabla w_{h,p}\right)^{T} \cdot \vec{a} u - w_{h,p}^{T} f\left(\vec{x}, t\right) dx + \sum_{f \in \partial\mathcal{T}_{h}} \int_{f} \left(w^{+}_{h,p} - w^{-}_{h,p}\right)^{T} H\left(u^{+}, u^{-}, \vec{n}\right) ds \\\ + \sum_{f \in \partial\Omega} \int_{f} w_{h,p}^{T} \vec{a} u \cdot \vec{n} ds.
\end{eqnarray*}
$$

Finally, to obtain solutions using the DG method, we seek $u_{h,p} \in \mathcal{W}^{DG}_{h,p}$ such that,
$$
\mathcal{R}^{DG}_{h,p}\left(u_{h,p}, w_{h,p}\right) = 0,
$$
$\forall w_{h,p} \in \mathcal{W}^{DG}_{h,p}$.

The DG method results in solutions that are discontinuous across element boundaries and up to polynomial order $p$.

The Two Temporal Discretizations

So far, I have skipped over how the Galerkin method can be applied to unsteady problems. Broadly speaking, there are two ways we can include the effect of a time derivative.

The first way we can include a time derivative is based on the following weak form form the time term,
$$
\int_{\Omega} -\left(\frac{\partial w}{\partial t}\right)^{T} u dx.
$$
In this case, we can extend our domain into the time-dimension. We would the have elements which represent spatial and temporal dimensions, and a solution $u_{h,p}\left(\vec{x},t\right)$ which is defined on that mesh in space and time. These kinds of methods are called space-time methods. To use this method we have to define $w_{h,p}\left(\vec{x},t\right)$ i.e. in space and time, and we obtain a solution in space and time at once.

The second method is based on using the following term,
$$
\int_{\Omega} w^{T}\left(\frac{\partial u}{\partial t}\right) dx.
$$
We could now approximate the time derivative as,
$$
\frac{\partial u}{\partial t} \approx \frac{u\left(\vec{x}, t + \Delta t\right) - u\left(\vec{x}, t\right)}{\Delta t}
$$
If we have the value of $u\left(\vec{x}, 0\right)$, we can find $u\left(\vec{x}, \Delta t\right)$, and then $u\left(\vec{x}, 2\Delta t\right)$ etc... To use this method we only have to define $w_{h,p}\left(\vec{x}\right)$ i.e. just in space, and we obtain a solutions at increasing times as the method progresses. This is not the only way we could approximate the time derivative using solutions at previous times. There is a large amount of literature on this for solving ordinary differential equations.

Closing Remarks

Rather than trying to find a solution to the weak form, we can find approximate solutions which are constructed from a finite number of possible functions that we choose a priori. The choice of the functions we use for approximate solutions has a significant effect on the quality of the resulting solution. Two common methods are:

The continuous Galerkin method, where we construct the solution using piecewise polynomial functions that are continuous across element boundaries.
The discontinuous Galerkin method, where we construct the solution using piecewise polynomial functions that are discontinuous across element boundaries.

The size, shape, and distribution of elements in a mesh can have a significant impact on the quality of a solution, and any quantities which are derived from it. The crux of output adaptive simulation is automatically tailoring the mesh (without user input) to obtain the highest quality outputs from a solution.

Have a topic you want a deeper dive on? Have questions? Have comments? Please reach out to me on LinkedIn or Twitter!

Introducing the Strong and Weak Forms

Ben Couchman — Tue, 09 Feb 2021 15:42:49 GMT

No good ending can be expected in the absence of the right beginning.

— I Ching

Before we can make our predictions better, we need to be able to make predictions in the first place. For physical systems, this means we need a mathematical model of how the system works. Once we have a set of equations that describe the system, we can get approximate solutions to them and make predictions about how the system will behave. It doesn't matter if we're looking at how fluids flow, how electromagnetic waves propagate, or how a building will flex in the wind; the first step is turning it into math.

Our Principal Equation

Fortunately, a lot of problems can be described by one family of equations. Unfortunately, they're partial differential equations (PDEs), which means its rare that analytic solutions exists for all but the simplest problems. This family of equations has a few names but we will call it the unsteady non-linear advection-diffusion-reaction equation,

$$
\frac{\partial u}{\partial t} + \nabla\cdot\left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right] + S\left(u, \grad u\right) - f\left(\vec{x}, t\right) = 0.
$$

This is an equation for $u\left(\vec{x}, t\right)$, where $\vec{x}$ is any point in the $domain$ of the PDE i.e. the space where the PDE is defined. For example, if we were looking at how water in a river was flowing, the domain would be the river we're interested in. It wouldn't make sense to try and solve "how the river is flowing" in any impermeable rock nearby. We will refer to the domain of the PDE with $\Omega$, and the boundaries of the domain as $\partial\Omega$.

By changing the definitions of $\vec{F}$, $\kappa$, $S$, and $f$ we can change the problem we're modeling. Any of these terms can be zero. Whether a term is present of not can have significant effects on the behavior of the solutions.

Let's take a look at this equation term-by-term:

$\frac{\partial u}{\partial t}$ — This term represents that the value of $u$ at a specific point in our domain may change in time. It is the presence of this term which makes the equation unsteady.
$\nabla\cdot\vec{F}\left(u\right)$ — This term is called the advective term, and usually represents the movement of $u$ about the domain by some bulk motion. Dye being carried downstream in a river would be represented by this term.
$\nabla\cdot\left[\kappa\left(u\right)\nabla u\right]$ — This term is the diffusion term. It represents the change in $u$ due to gradients. The change in temperature as you're defrosting a Thanksgiving turkey would be represented by this term.
$S\left(u, \grad u\right)$ — This is term is reaction term (although we will call it the source term). It represents changes in $u$ that are a function of $u$ (and its gradient) at a specific point in the domain. Chemical reactions would be represented by this term.
$f\left(\vec{x}, t\right)$ — This is the forcing term, and represents changes to $u$ that are independent of its value.

We can also think of $u$ as a vector of multiple variables, or as a single variable. Thinking of $u$ representing a vector is convenient if there are multiple PDEs of interest on the same domain, especially if they're coupled. In this case, we can think of each variable as a component of $u$, and each equation contributing to the components of $\vec{F}$, $\kappa\left(u\right)$ etc. Since we are dealing with spatial vectors and solution vectors, for the dot product we will use $a \cdot b$ if $a$ and $b$ are spatial vectors, and $a^{T} b$ if they are solution vectors.

The unsteady non-linear advection-diffusion-reaction equation is so powerful because it can describe (almost) any conservation law. That is, if you have a system that can be thought of as balancing the net transport of a quantity into a region with the change in that quantity with with time, and any creation or destruction if it, then you can probably use this equation to describe it. This includes:

Fluid flows
The temperature of solids, or fluids
The stress and strain in solids
Pollutant concentrations
Oil moving through porous rock

and many more.

The Strong Form and its Limitations

PDEs can generally be written in two forms: strong form, and weak form. We wrote the unsteady non-linear advection-diffusion-reaction equation above in strong form. The problem with using the strong form is that it actually prevents you from accessing certain solutions. To demonstrate this, lets think about the shape of a string under tension with fixed ends, and a distributed load applied to it. If the string is stationary, we can describe the deflection of the string ($u$) as,

$$
-\frac{d^{2} u}{dx^{2}} = \frac{p}{T},
$$

where $T$ is the tension of the string and $p$ is the distributed load (i.e. force per unit length) on the string. We will assume that the string is fixed at $x = 0$ and $x = L$. Let's imagine we apply a distributed load (say $P$) over the middle half of the string only. Anywhere where we're not applying $P$, the string will be straight because $\frac{d^{2} u}{dx^{2}} = 0$, and anywhere we are applying $P$ the string will curve. Now let's double the distributed load ($2P$), but only apply it over the middle quarter of the string — more of the string will be straight, and the curvature of the middle quarter will be larger. We can repeat this process, doubling the distributed load but halving the portion of the string we apply it over. Each time, more of the string will be straight, and the curvature of the string in the middle will increase.

What happens if we repeat this process and infinite number of times? We have an infinitely large distributed load over an infinitely small length of string. We now have $\frac{d^{2} u}{dx^{2}} = \infty$ at the point at the center of the string, and $\frac{d^{2} u}{dx^{2}} = 0$ everywhere else. Intuitively, we expect the string to have two straight section with a "kink" in the middle. So, how do we solve $\frac{d^{2} u}{dx^{2}} = \infty$ at that center point where the kink is? The answer is, we can't. To solve this PDE, we can only get solutions where $\frac{d^{2} u}{dx^{2}}$ is finite... so no strings with kinks. We call the set of functions where $\frac{d^{2} u}{dx^{2}}$ is finite $\mathcal{C}^{2}$ — $\mathcal{C}$ for continuous, and $2$ for second derivative. In this case, solutions to the strong form can only be functions in $\mathcal{C}^{2}$.

From Strong Form to Weak Form

It turns out, there is a way to solve these kind of problems, but we cannot use the strong form of the PDE. To see this, let's take our equation for the deflection of the string, and multiply it by a weight function $w$,
$$ - w \frac{d^{2} u}{dx^{2}} = w \frac{p}{T}, $$
where $w$ is any function that fulfills some properties. We don't know what these properties are yet, so we'll use $\mathcal{X}$ as short hand for "all functions that fulfill some properties we'll define later". Out of convenience, we can also use the following set notation: $w \in \mathcal{X}$, which is short hand for "$w$ is any function from $\mathcal{X}$".

Our next steps are to integrate this equation over the entire domain of the PDE,
$$
\int_{0}^{L} -w \frac{d^{2} u}{dx^{2}} dx = \int_{0}^{L} w\frac{p}{T} dx,
$$
and apply integration-by-parts,
$$
\int_{0}^{L} \frac{d w}{d x} \frac{d u}{dx} dx - \left[w\left(L\right)\frac{d u\left(L\right)}{d x} - w\left(0\right)\frac{d u\left(0\right)}{d x}\right] = \int_{0}^{L} w\frac{p}{T} dx.
$$
We've got rid of the second derivative, and replaced it with a derivative on $w$ instead — this is the weak form!

Lets go one step further and say that $u$ and $w$ should be functions from the same set of functions, $\mathcal{X}$. In that case, we can use the weak form to find any solution where $\int_{0}^{L} \left(\frac{d u}{dx}\right)^{2} dx$ is finite. It turns out many more functions fulfill this criteria than have finite $\frac{d^{2} u}{dx^{2}}$. The space of functions with finite $\int_{0}^{L} \left(\frac{d u}{dx}\right)^{2} dx$ is called $\mathcal{H}^{1}$ — $\mathcal{H}$ means Hilbert (after the mathematician David Hilbert), and $1$ for the first derivative. For completeness, it is worth noting that in the problem described above $P$ becomes the delta function, so the integral on the right-hand-side of our weak form is also well defined.

Closing Remarks

The strong form of the unsteady non-linear advection-diffusion-reaction equation is given by,
$$
\mathcal{L}\left(u\right) \equiv \frac{\partial u}{\partial t} + \nabla\cdot\left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right] + S\left(u, \grad u\right) - f\left(\vec{x}, t\right).
$$
Solutions to the strong form are found by solving,
$$
\mathcal{L}\left(u\right) = 0
$$
for $u$. This form of the PDE requires that $u$ has finite second derivitives over the entire domain.

The weak form of the unsteady non-linear advection-diffusion-reaction equation is given by,
$$
\begin{eqnarray*}
\mathcal{R}\left(u, w\right) \equiv \int_{\Omega} \Bigg[-\left(\frac{\partial w}{\partial t}\right)^{T} u - \left(\nabla w\right)^{T} \cdot \left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right] + w^{T} S\left(u, \grad u\right) \\\ - w^{T} f\left(\vec{x}, t\right)\Bigg] dx + \int_{\partial\Omega} w \left[\vec{F}\left(u\right) - \kappa\left(u\right)\nabla u\right] ds.
\end{eqnarray*}
$$
Solutions to the weak form involves finding $u \in \mathcal{X}$, such that
$$
\mathcal{R}\left(u, w\right) = 0
$$
$\forall w \in \mathcal{X}$ (that is, for all $w$ in $\mathcal{X}$). Here, we've reduced (weakened as it were) the continuity requirements for solutions when compared to the strong form.

The weak form has advantages beyond the reduced continuity requirements. It is also much more amenable to mathematical analysis — results from functional analysis can be used to prove several important results for weak solutions.

We will be making significant use of the weak form of the unsteady non-linear advection-diffusion-reaction equation over the rest of this series.

Any questions or comments? Feel free to reach out to me on LinkedIn or Twitter and let me know!

An Overview of Output Based Adaption

Ben Couchman — Mon, 01 Feb 2021 00:49:42 GMT

Over the next few weeks, Cory Frontin and I are releasing a series of posts giving an overview of the mathematics that underpin output based adaption:

Introducing the Strong and Weak Forms
Galerkin Methods
Adjoint Fundamentals
Error Estimation
Metric Based Meshing
Putting it All Together: MOESS

The idea is to present the theory behind how output based adaption works, without going overboard on the mathematics. We’re trying to keep the posts bite-sized, at about 5-10 minute reads each. We'll start with how to mathematically describe the way many physical systems change in time (which includes how fluids move), before moving on to how we can describe the errors from approximately solving this class of problems. Then we’ll finish up with a few posts on a certain class of mesh generation methods, and how our output adaptive algorithm works end-to-end. We’re going to assume a base level of comfort with concepts and notation from integral and differential calculus. We will also be using some notation from more advanced mathematics, but these will be described and motivated as they are introduced.

Before getting into the mathematics, I thought I would share a result from my research as a quick motivation for output adaptive simulation. Let's imagine we have cylinder in a Mach 20 flow. If we were interested in how hot the cylinder would get, then we need to predict the heat flux over the surface of the cylinder. One way to do this is to solve the equations governing fluid dynamics over the cylinder and extract the heat flux from that solution.

Below I have two simulations of part of a 2D slice of the cylinder. In both cases, the surface of the cylinder is the curved boundary on the right hand side, and the Mach 20 air runs left-to-right. The difference in the two simulations is that the result on the left is from the first step of our adaptive algorithm, and the solution on the right is from the last step.

Using output based adaption, we can start with the low quality solution on the left, where we're completely missing how the temperature varies near the surface of the cylinder. Then interate (in a completly automated fashion) until we end up with a much higher quality solution like the solution on the right.

The only change that the output adaptive algorithm makes is to the underlying mesh on which we solve the problem. The grid has been made finer, both at the shock centerline (approximately in the middle of the domain) and at the surface of the cylinder. These changes are what result in the higher quality solution.

Any questions or comments? Feel free to reach out to me on LinkedIn or Twitter and let me know!

Output Adaptive Simulation: Is it Machine Learning?

Ben Couchman — Mon, 18 Jan 2021 00:56:47 GMT

Oh, short answer, "yes" with an "if." Long answer, "no" with a "but."

-- Reverend Lovejoy, "Hurricane Neddy"

When I explain my research to friends and family, they often ask me whether output adaptivity is some kind of machine learning. Initially, I was fairly certain that output adaptivity was not machine learning. However, the more I thought about it, the less certain I became. Now I think it may be fair to describe output adaptivity as a kind of machine learning for the grid generation process. I’m not an expert on machine learning, so I’ll try to stay away from strong claims, but I decided to lay out my thoughts anyway.

Lets start by defining what machine learning is. Steven Knox describes learning in the following way^[1]:

The Problem of Learning. There are a known set $\mathcal{X}$ and an unknown function $f$ on $\mathcal{X}$. Given data, construct a good approximation $\hat{f}$ of $f$. This is called learning $f$.

So, I think its fair to say that if we’re using an automated process to learn $f$, then we’re machine learning. With this definition in hand, lets see if there are any parallels between machine learning and output adaptive simulation.

For a given physical domain, we can think of there being a set of all possible valid grids we could construct within that domain. On each of those grids, we could calculate a solution to the governing equations, and from there calculate an output of interest (e.g. drag) with an associated discretization error. The purpose of output adaptive simulation is to find the grid (from the set of valid grids), which minimizes the discretization error. Referring back to Knox’s definition, we can define $\mathcal{X}$ as the set of possible grids, and $f$ is the discretization error in the specified output.

This is starting to sound like machine learning; we’re trying to optimize against a function which doesn’t have a closed form. But are we using machine learning to do it? That depends if we construct an approximation to $f$ using data as part of the optimization process.

In Professor David Darmofal’s research group, we use an output adaptive algorithm called Mesh Optimization via Error Sampling and Synthesis (MOESS)^[2].

The MOESS-magic happens in the box labeled "Adapt grid to control error". In this step we estimate the change in discretization error when locally refining the grid, and then use this data to construct a surrogate model of the error with respect to the grid. It is this surrogate model which we optimize against.

The construction of the surrogate model is what makes MOESS machine learning. We take training data (in the form of local refinements of a grid), learn the function between the grid and the discretization error (i.e. the surrogate model is $\hat{f}$), and then optimize against it. To me, at least, this sounds like using machine learning to generate optimal grids.

I have to admit, when I hear "machine learning for grid generation", I still think of something like taking a large number of hand generated grids over a range of problems and "machine learning" those, maybe with some kind of deep neural net. Output adaptive simulation is very different from that -- iteratively testing and modifying grids until an optimal grid is found. I'm not sure if the impression I have indicates that I am missing something in how I am assessing output adaption as machine learning. Does machine learning require big data? What about a priori models? Maybe my impressions of machine learning are based on marketing more than engineering.

What do you think? Am I missing something in how I’m looking at machine learning? Is the connection obvious? Reach out to me on LinkedIn or Twitter and let me know!

Steven .W. Knox (2018). Machine Learning: A Concise Introduction. John Wiley & Sons, Inc. https://doi.org/10.1002/9781119439868. ↩︎
Yano, Masayuki, and David L. Darmofal. “An optimization-based framework for anisotropic simplex mesh adaptation.” Journal of Computational Physics 231.22 (2012): 7626-7649. https://doi.org/10.1016/j.jcp.2012.06.040. ↩︎