AWLS Extension

Although the modified moving least square in DTK2 is a very advanced data remapping method, there are still rooms for improvement.

  1. Whenever dealing with Vandermonde systems, stability must be taken into consideration.
  2. The local support of the Vandermonde systems should be adaptive based on the properties of local stencils.

The MLS method solves the local problem in the normal equation setting, which doubles the condition number of the system. MMLS solves the local system with truncated singular value decomposition (TSVD), but the problem is that the truncated terms in the Vandermonde systems cannot be controlled. Notice that in order to perform the TSVD, an condition number estimator is needed, and MMLS uses the one in LAPACK for general matrices that requires an LU factorization, which takes another O(n^3) FLOPs.

Regarding implementation details, MMLS uses the global cartesian coordinates for constructing Vandermonde matrices and calls SVD incrementally to ensure the resulting systems are full rank, and this procedure is less stable and costly.

For the local stencil choice, directly using the query results is risky, because the stencil might be too large. In general, the number of points in the local stencils should be determined by the column sizes of the Vandermonde systems.

Moreover, after the data has been remapped, an intermediate target solutions are obtained. Combining with the input source data, a posteriori error analysis can be performed in order to fix non-smooth values.

With these considerations, we have implemented adaptive weighted least square (AWLS) as an extension of the meshless methods in DTK2.

Weighted Least Square (WLS) Formulation

Formulation

Given the source input discrete function values \boldsymbol{f}^s and target output \boldsymbol{f}^t, we want to construct a transfer operator \boldsymbol{T}, s.t.

(1)\boldsymbol{f}^t&=\boldsymbol{T}\boldsymbol{f}^s

Clearly, such an operator is rectangle with sizes n by m, where n is the number of nodes in target grid while m for that of the source grid.

Therefore, for each target node i, we have an “interpolation” that reads:

(2)f_i^t&=T_{i,\boldsymbol{J}}\boldsymbol{f}_{\boldsymbol{J}}^s

where \boldsymbol{J} is the local support stencil around target node i. Denote \boldsymbol{c}^T=T_{i,\boldsymbol{J}}, and localize the stencil \boldsymbol{J} around the target node \boldsymbol{x}_i and denote the localized stencil to be \boldsymbol{u}, i.e. \boldsymbol{u}=\boldsymbol{x}-\boldsymbol{x}_i, we can then build the generalized Vandermonde system (assume 2D):

(3)\boldsymbol{V}&=
    \left\vert
    \begin{array}{*{6}{c}}
        \boldsymbol{1} & \boldsymbol{u}_1 & \boldsymbol{u}_2 & \boldsymbol{u}_1^2 & \boldsymbol{u}_1\boldsymbol{u}_2 & \boldsymbol{u}_2^2 \\
    \end{array}
    \right\vert

So the coefficients \boldsymbol{c}^T can be solved with the following fitting problem:

(4)\boldsymbol{V}^T\boldsymbol{c}&=\boldsymbol{e}_1

If \boldsymbol{V} is square system, i.e. typical Vandermonde system, then (4) is just an interpolation problem. When we have more points than the number of columns in \boldsymbol{V}, it results a rectangle system thus requiring least square minimization.

Note that a more intuitive derivation of (4) is to first form a system of polynomial coefficients for \boldsymbol{J} and explicitly compute the values of any points in \boldsymbol{u}.

(5)\begin{eqnarray*}
\boldsymbol{V}\boldsymbol{C}&=\boldsymbol{I} \\
\boldsymbol{C}&=\boldsymbol{V}^+
\end{eqnarray*}

Where \boldsymbol{V}^+ is the pseudo-inverse of \boldsymbol{V}. Given a point \boldsymbol{p} in the local coordinate system \boldsymbol{u}, its Vandermonde component is:

(6)\boldsymbol{p}_V&=[1\ p_1\ p_2\ p_1^2\ p_1p_2\ p_2^2]^T

Then evaluating (6) in the system of polynomial coefficients, i.e. (5), is just to perform \boldsymbol{p}_V^T\boldsymbol{C}. It’s worth noting that for the fitting problem, where the query point is always the center (the target node of interest) thus having the the Vandermonde component [1\ 0\ 0\ 0\ 0\ 0]^T. Therefore, the explicit computation reduces to (4).

It is well-known that the Vandermonde systems are ill-conditioned, so a balancing technique is needed. A typical way is to do a column scaling: \boldsymbol{V}\boldsymbol{S}. Typical choices of the diagonal matrix \boldsymbol{S} are 1) algebraic scaling that is based on the norms of column vectors of \boldsymbol{V} and 2) geometric scaling that is based on the radii of the local stencil \boldsymbol{J}. Here we choose the latter, and the local stencil radii is chosen as:

(7)r&=\max\limits_{j\in\boldsymbol{J}}(\max\limits_{1\le d\le D}(\left\vert\boldsymbol{u}_d^j\right\vert))

Where D is the spatial dimension. Then the column scaling matrix is (assume 2D):

(8)\boldsymbol{S}&=
    \left\vert
    \begin{array}{*{6}{c}}
        1 & 0 & 0 & 0 & 0 & 0 \\
        0 & 1/r & 0 & 0 & 0 & 0 \\
        0 & 0 & 1/r & 0 & 0 & 0 \\
        0 & 0 & 0 & 1/r^2 & 0 & 0 \\
        0 & 0 & 0 & 0 & 1/r^2 & 0 \\
        0 & 0 & 0 & 0 & 0 & 1/r^2 \\
    \end{array}
    \right\vert

Note

\boldsymbol{S} doesn’t affect the solution.

Now, the least square problem (5) can be formulated as:

(9)\min(\left\Vert\boldsymbol{V}\boldsymbol{C}-\boldsymbol{I}\right\Vert_{\boldsymbol{W}})

We choose to use the family of radius basis functions (RBF) as the diagonal row weighting matrix \boldsymbol{W}.

Note

\boldsymbol{W} does affect the solution.

With (8) and (9), we have a weighted and balanced generalized Vandermonde system:

(10)\hat{\boldsymbol{V}}&=\boldsymbol{W}\boldsymbol{V}\boldsymbol{S}

Plug (10) into (4) and reorganize it, we have:

(11)\begin{eqnarray*}
\hat{\boldsymbol{V}}^T\boldsymbol{W}^{-1}\boldsymbol{c}&=\boldsymbol{S}^{-1}\boldsymbol{e}_1 \\
\hat{\boldsymbol{V}}^T\hat{\boldsymbol{c}}&=\boldsymbol{e}_1
\end{eqnarray*}

Where \hat{\boldsymbol{c}}=\boldsymbol{W}^{-1}\boldsymbol{c}.

Solving WLS

We follow the technique of using truncated QR with column pivoting (TQRCP) introduced here [1] to solve the problem (11).

The first step is to decompose \hat{\boldsymbol{V}} with QRCP:

(12)\hat{\boldsymbol{V}}\boldsymbol{P}&=\boldsymbol{Q}\boldsymbol{R}

The truncation step is to utilize a condition number estimator for the upper triangular system \boldsymbol{R} that will report the rank of the system—k. Combine this with (12) and plug into (11), we have:

(13)\begin{eqnarray*}
\boldsymbol{P}_{:,1:k}\boldsymbol{R}_{1:k,1:k}^T\boldsymbol{Q}_{:,1:k}^T\hat{\boldsymbol{c}}&=\boldsymbol{e}_1 \\
\boldsymbol{R}_{1:k,1:k}^T\boldsymbol{Q}_{:,1:k}^T\hat{\boldsymbol{c}}&=\boldsymbol{P}_{:,1:k}^T\boldsymbol{e}_1 \\
\boldsymbol{Q}_{:,1:k}^T\hat{\boldsymbol{c}}&=\boldsymbol{R}_{1:k,1:k}^{-T}\boldsymbol{P}_{:,1:k}^T\boldsymbol{e}_1 \\
\hat{\boldsymbol{c}}&=\boldsymbol{Q}_{:,1:k}\boldsymbol{R}_{1:k,1:k}^{-T}\boldsymbol{P}_{:,1:k}^T\boldsymbol{e}_1 \\
\boldsymbol{c}&=\boldsymbol{W}\boldsymbol{Q}_{:,1:k}\boldsymbol{R}_{1:k,1:k}^{-T}\boldsymbol{P}_{:,1:k}^T\boldsymbol{e}_1
\end{eqnarray*}

This procedure is very efficient and robust. The dominated computation cost comes from QRCP, i.e. O(n^3), where n is some measure of the size of local problems. It’s worth noting that QRCP is called only once for each of the local problems.

[1]R. Conley, T. J. Delaney, and X. Jiao. Overcoming element quality dependence of finite elements with adaptive extended stencil FEM (AES-FEM). Int. J. Numer. Meth. Engrg., vol. 108, pp. 1054–1085, 2016.

Improving the Strategy of Choosing Local Stencil

The original DTK2 stencil choice is based on a global configuration parameter of either knn (k nearest neighborhoods) or radius. Moreover, the resulting neighbors are used to be the support of the local problems, i.e. number of points used in the Vandermonde systems. This procedure is not robust for problems with point clouds that are not uniformly distributed. To overcome this, we have implemented the following three steps for querying and choosing local supports.

  1. Perform knn search with a small number of neighborhoods to estimate the radii of the one-ring neighborhood of the local point clouds, say h.
  2. Perform radius search with a relatively large radius so that we have enough candidates for next step. The radius is chosen as: r=ah.
  3. Choose the support stencil, i.e. \boldsymbol{J} from the candidates based on the number of columns (coefficients) in the Vandermonde systems, i.e. \left\vert\boldsymbol{J}\right\vert=\rho c_d. Notice that c_d are 3, 6, and 10 for dimension (subscript d) 1, 2, and 3 for quadratic polynomial basis functions, respectively.

Note

We choose a=5 for candidate selections and observe that better results can be obtained with \rho\approx 3.

Note

Theoretically, steps 1 and 2 can be achieved with knn search if we can determine the number points in step 3. However, since we have ran into robustness issues with DTK2 while using knn as the primary search mechanism, we rely on the radius search.

With this strategy, the user doesn’t need to provide a radius. However, we still believe using topology based query is still efficient and better, but this requires meshes, which is an arduous task for this work.

Automate Parallel Mesh Rendezvous

While use DTK2 in parallel, each operator is built based on the cartesian bounding box the target point cloud with an approximated extension. For this extension length, we have implemented the following strategy:

(14)r&=\max(\alpha h_b, \frac{\beta h_b}{N^{1/d}}, r_u)

Where h_b is the longest edge in the cartesian bounding box, N is the number of nodes, d is the topological dimension, and r_u is user-provided radius extension. Notice that the second term essentially tries to estimate the average cell-size of the point cloud.

Note

We choose \alpha=0.1 (10%) and \beta=5 as default parameters.

Adaptation with Non-smooth Functions

A common challenge of solution/data remapping is to handle non-smooth functions, i.e. avoiding Gibbs phenomenon. We apply a simple bounding strategy that enforces the transferred solutions are bounded locally with respect to the sources values in the stencil \boldsymbol{J}.

Given the intermediate solutions \hat{\boldsymbol{f}}_t after applying the WLS fitting, then the following limitation is performed:

(15)f_i^t&=
\begin{cases}
    \hat{f}_i^t & \hat{f}_i^t \in range(\boldsymbol{f_J}^s) \\
    \max(\boldsymbol{f_J}^s) & \hat{f}_i^t \approx \max \\
    \min(\boldsymbol{f_J}^s) & ow
\end{cases}

In solution transfer, the strategy above may crop high-order values to first order in smooth regions, this is particularly true when the two grids resolutions differ too much from each other. Therefore, we need a mechanism that can pre-screen the non-smooth regions. We utilize a gradient based smoothness indicator, which is similar to those used in WENO schemes.

(16)\frac{\left\vert\hat{f}_i^t-f_1^s\right\vert}{\epsilon h}&\le\sigma

where \epsilon=\max\limits_{j\in\boldsymbol{J}}(\left\vert f_j^s\right\vert). This scheme is efficient and doesn’t require geometry data. The drawback is also obvious—it’s too simple and tuning \sigma is not easy.

Results

Convergence Tests

We have tested the AWLS on the following settings:

  1. plane surface,
  2. spherical surface, and
  3. torus surface.

For the plane surface, we use structured grids on one side and triangular meshes on the other side. For setting 2, we use cubed-sphere grids and triangular meshes. For the last case, we use triangular meshes with different resolutions. All grids are uniformly refined by three levels in order to study the convergences. Notice that only the coarsest levels are shown in the following figures and all tests are transferred from fine grids to the corresponding coarse ones.

Finer Grids Coarser Grids
_images/flat_quad_grids.png _images/flat_tri_grids.png
_images/sphere_tri_grids.png _images/sphere_quad_grids.png
_images/torus_fine_grids.png _images/torus_coarse_grids.png

The convergence metric is:

c&=\left\vert\frac{\log_2(e_3/e_1)}{\log_2(h_3/h_1)}\right\vert

Where h is some consistent measures of the grids. The error metric we use is relative \ell_2 errors:

e&=\frac{\left\Vert f_h-f\right\Vert_2}{\left\Vert f\right\Vert_2}

For the plane surface, the following two models are tested:

  1. f(x,y)=e^{x+y}, and
  2. f(x,y)=\sin(\frac{\pi x}{2})\cos(\frac{\pi y}{2}).

We use WENDLAND21 as weighting scheme and choose \rho=3 (18 points in stencils), the results are:

_images/plane_conv.png

Convergence of the plane surface

For the 3D cases, we choose the following models:

  1. f(x,y,z)=(\sin(x)+\cos(y))z, and
  2. f(x,y,z)=e^{x+y+z}.

We use WENDLAND21 as weighting scheme and choose \rho=3.2 (32 points in stencils) for the spherical surface and \rho=2.3 (23 points in the stencils) for the torus surface, the results are:

_images/sph_conv1.png

Convergence of the spherical surface

_images/torus_conv.png

Convergence of the torus surface

Note that for all cases, the super-convergence phenomenon is observed, i.e. the convergence rate is greater than (p+1)-st order.

The torus program script can be obtained here, and the corresponding grids are stored in compressed npz file that can be downloaded here.

Since the spherical surface is really special due to its smoothness and symmetry, an almost-(p+2)-nd order super-convergence can be obtained with large stencils and WU2 weighting schemes. The following results are with \rho=6.8 (68 points in stencils):

_images/sph_conv2.png

However, this is less practical and hard to apply on general surfaces. As a matter of fact, we didn’t observe this with the torus setting.

You can obtain the test grids in VTK hear.

Non-smooth Functions

As a preliminary report, we perform the discontinuous test with the heaviside function on the plane surface with \sigma=2. The results are:

_images/hv_os.png

Heaviside function w/o resolving non-smooth regions

_images/hv_noos.png

Heaviside function with resolving non-smooth regions

Usage

In order to use AWLS, you need to install the DTK2 package from my personal forked version.

Note

I didn’t decide to make a PR due to DTK2 is probably deprecated in the official repository???

To determine the backend DTK2, a static member function is implemented:

>>> from parpydtk2 import *
>>> assert not Mapper.is_dtk2_backend()

To enable AWLS support, you just need to call awls_conf():

>>> mapper.awls_conf()

The default settings are:

  1. method is set to AWLS,
  2. basis is set to WENDLAND21,
  3. \alpha is set to 0.1 in (14),
  4. \beta is set to 5 in (14),
  5. \rho is set to 3 for choosing local stencils

A complete list of parameters are:

>>> mapper.awls_conf(
...     ref_r_b=...,
...     ref_r_g=...,
...     dim=...,
...     alpha=...,
...     beta=...,
...     basis=...,
...     rho=...,
...     verbose=True,
... )

Where ref_r_b and ref_r_g are r_u in (14) for blue and green participants. dim is the topological dimension used in (14), which is important if you need to transfer surface data (90% cases) in 3D space in order to estimate the proper of mesh cell sizes.

In order to resolve discontinuous solutions, you need to pass in at least one additional parameter resolve_disc to transfer_data():

>>> bf_tag, gf_tag = 'foo', 'bar'
>>> mapper.transfer_data(bf_tag, gf_tag, direct=B2G, resolve_disc=True)

The default is \sigma in (16) is 2. To use another value, simply do:

>>> mapper.transfer_data(...,resolve_disc=True, sigma=...)

Note

Resolving discontinuous solutions only works with AWLS method under CHIAO45 backend.