Part 7: Iterative Solvers and Memory Strategy

This part focuses on the practical decision of which solver to use and why memory and factorization can become the binding constraint for large problems.

Practical Goal

Choose between direct and iterative solvers based on problem scale.
Understand when memory limits dominate solver selection.
Apply a repeatable fallback strategy when direct solves are impractical.

Example Anchor

Primary script: regularizeNd/Examples/a07_exampleIterativeSolver.m

Primary data file: a07_iterative_solver_data.mat

The Full Solver Menu

regularizeNd accepts the solver argument as the sixth positional parameter:

yGrid = regularizeNd(x, y, xGrid, smoothness, interpMethod, solver)

Solver	Type	When to use
`"normal"`	Direct (normal equations)	Fast direct baseline when memory is sufficient; often ~3× faster than `\` for well-conditioned systems.
`"\"`	Direct (backslash / sparse QR-LU)	Most robust; fall back when `"normal"` produces unexpected results.
`"pcg"`	Iterative (conjugate gradient)	Large problems where direct solvers exhaust RAM. First iterative choice.
`"symmlq"`	Iterative (symmetric LQ)	Very similar to `pcg`; try if `pcg` stalls or diverges.
`"lsqr"`	Iterative (least-squares QR)	Last resort when `pcg` and `symmlq` fail; solves \(A\mathbf{x}=\mathbf{b}\) directly rather than the normal equations. Slower but most numerically stable of the three iterative options.

Pathology Summary (From a07 Example)

The attached a07 example is a 5D case with grid:

\[ 25 \times 7 \times 7 \times 12 \times 8 = 117{,}600 \]

Observed matrix sizes:

\(A \in \mathbb{R}^{499{,}748 \times 117{,}600}\), sparse, \(\operatorname{nnz}(A)=2{,}582{,}568\).
\(b \in \mathbb{R}^{499{,}748}\).

Two row families appear in \(A\):

37,356 rows with 32 nonzeros (data-fidelity rows from \(2^5\) multilinear interpolation stencils).
462,392 rows with 3 nonzeros (second-order regularization rows).

This is why the problem feels pathological: \(A\) itself is modest, but direct factorization of the normal system creates massive fill.

What Is a Stride?

A stride is the index jump in flattened column-major storage when you move by +1 in one grid dimension.

For grid \([25, 7, 7, 12, 8]\), strides are:

Dimension	Grid size	Stride
1	25	1
2	7	25
3	7	175
4	12	1,225
5	8	14,700

Second-order regularization uses a stencil of the form \([1,-2,1]\) across indices \([j, j+s, j+2s]\) where \(s\) is the stride for that dimension.

In dimension 5, \(s=14{,}700\), so rows can connect nodes \(j\) and \(j+29{,}400\). That produces direct edges in \(A^\top A\) at bandwidth 29,400.

Why Direct Methods Blow Up in RAM

Even though \(A^\top A\) is still sparse, Cholesky fill-in is enormous for this graph structure.

\(\operatorname{nnz}(A^\top A) \approx 2.27\times10^6\)
Symbolic AMD Cholesky gives \(\operatorname{nnz}(L) \approx 1.81\times10^9\)
Sparse storage for \(L\) is about 21.8 GB, and factorization peak memory is about 44 GB.

So direct methods are not “failing randomly”; they are hitting a predictable structural memory wall.

The practical consequence is:

"normal" can be much faster than "\" when it fits.
Both direct methods can become impractical once fill-in dominates available RAM.
Iterative methods become the preferred strategy for this case.

Measured Timing (From a07)

These are measured values from the a07 example data and script:

Solver	Approximate solve time
`"normal"`	~930 s
`"\"`	~8 266 s
`"pcg"`	~53 s
`"symmlq"`	~53 s
`"lsqr"`	~378 s

The iterative solvers are 12–17× faster than the direct options at that scale.

Why Iterative Solvers Work Here

Iterative methods avoid explicit factorization and rely on matrix-vector products with \(A\) and \(A^\top\). Memory stays close to \(O(\operatorname{nnz}(A))\) plus a few working vectors, which is why they remain practical in this problem class.

For this specific a07 case, start with iterative methods first if your workstation has limited RAM.

Controlling Iterative Solver Behavior

Two additional optional positional arguments are available:

yGrid = regularizeNd(x, y, xGrid, smoothness, interpMethod, solver, ...
    maxIterations, solverTolerance)

Argument	Default	Effect
`maxIterations`	solver default	Maximum number of iterations before early exit
`solverTolerance`	solver default	Relative residual tolerance for convergence

Tighter tolerances increase iteration count and runtime. For engineering lookup tables 1e-6 is almost always sufficient.

Decision Tree

flowchart TD
    A([Start]) --> B{1D, 2D, or 3D?}
    B -- Yes --> C(["Use 'normal' ✓"])
    B -- No, 4D or higher --> D{"Last dim has large stride?<br/>stride &gt; n / 10"}
    D -- Yes, fill explosion likely --> G
    D -- No --> E["Try 'normal'"]
    E --> F{"Fast enough<br/>and fits in RAM?"}
    F -- Yes --> Done1(["✓ Done"])
    F -- No, slow or out of memory --> G["Try 'pcg'"]
    G --> H{Converges?}
    H -- Yes --> Done2(["✓ Done"])
    H -- No --> I["Try 'symmlq'"]
    I --> J{Converges?}
    J -- Yes --> Done3(["✓ Done"])
    J -- No --> K["Try 'lsqr'<br/>last resort"]
    K --> L{Converges?}
    L -- Yes --> Done4(["✓ Done"])
    L -- No --> M(["Reduce grid resolution<br/>in the highest-stride dimension,<br/>or split into subdomains"])

Live Code Walkthrough (Short Blocks, Mocked Output)

The blocks below are intentionally short and marked eval: false so the page renders quickly. Outputs are mocked from measured runs.

Load the a07 data

clc; clear; close all;
load('a07_iterative_solver_data.mat');

smoothness = [0.001 0.01 0.01 0.001 0.01];
interpMethod = "linear";

Direct solvers

tic;
~ = regularizeNd(inputs, output, xGrid, smoothness, interpMethod, 'normal');
toc;

[Mock output]
Elapsed time is 930.022098 seconds.

tic;
~ = regularizeNd(inputs, output, xGrid, smoothness, interpMethod, '\');
toc;

[Mock output]
Elapsed time is 8266.395394 seconds.

Iterative solvers

tic;
~ = regularizeNd(inputs, output, xGrid, smoothness, interpMethod, 'pcg');
toc;

[Mock output]
Elapsed time is 53.382053 seconds.

tic;
~ = regularizeNd(inputs, output, xGrid, smoothness, interpMethod, 'symmlq');
toc;

[Mock output]
Elapsed time is 52.895435 seconds.

tic;
~ = regularizeNd(inputs, output, xGrid, smoothness, interpMethod, 'lsqr');
toc;

[Mock output]
Elapsed time is 377.954214 seconds.

Compression Approach for Very Large Grids

If you cannot reduce grid size without sacrificing quality, two practical strategies:

Dimension-by-dimension refinement — keep coarse in all dimensions except the one that most drives quality, then refine that dimension alone. The grid count grows linearly, not exponentially.
Local fitting — partition the domain into overlapping subdomains, fit each independently, and stitch with a smooth blending function. Not built into regularizeNd but straightforward to implement around it.

Memory Footprint Reality Check for This Example

For this specific a07 case:

Quantity	Approximate value
\(\operatorname{nnz}(A)\)	2,582,568
\(\operatorname{nnz}(A^\top A)\)	2,266,280
\(\operatorname{nnz}(L)\) (symbolic AMD Cholesky)	\(1.81\times10^9\)
\(L\) sparse storage	~21.8 GB
Factorization peak memory	~44 GB

This is why direct methods can run out of RAM/swap even when \(A\) appears modest.

Memory Footprint Estimate (Rule of Thumb)

A rough estimate of peak memory before the solve step for an \(n\)-node grid:

Component	Memory
\(A_\text{fidelity}\) (sparse)	\(\approx 2^D \cdot N_\text{data} \cdot 8\) bytes (linear interp)
\(L_\text{reg}\) per dimension	\(\approx 3 \cdot n \cdot 8\) bytes (tridiagonal)
\(A^\top A\) (SPD, direct)	\(\approx n^2 \cdot 8\) bytes worst case

Where \(D\) is the number of input dimensions and \(N_\text{data}\) is the scatter data count. The \(A^\top A\) term is the one that kills direct solvers at large \(n\).

Checklist Before Leaving This Part

Acknowledge that "normal" does not works for your problem size.
If RAM is the constraint, have you tried "pcg" with maxIterations and solverTolerance?
Have you estimated grid node count and peak memory before scaling up?

Part 8: Lineage and Practical Migration Notes