I recently worked through a constrained least-squares problem where my intuition was wrong: OSQP solved faster and more reliably with internal scaling turned off.
I expected scaling to help ADMM. On this dataset, the opposite happened.
Problem setup
I have data matrices with these dimensions:
- \(C \in \mathbb{R}^{3086 \times 1000}\)
- \(A \in \mathbb{R}^{1999 \times 1000}\)
- \(d \in \mathbb{R}^{3086}\)
- \(b \in \mathbb{R}^{1999}\)
The optimization is a constrained least-squares problem:
\[ \min_x \frac{1}{2}\lVert Cx-d \rVert_2^2 \quad \text{s.t.} \quad Ax \le b. \]
I tested both QP formulations:
- Normal-equations form: \(P=C^TC\), \(q=-C^Td\).
- Slack form (recommended for OSQP):
\[ \min_{x,y} \frac{1}{2} y^T y, \quad \text{s.t.} \quad y = Cx-d, \ Ax \le b. \]
Why this looked numerically difficult
The scales are very different:
- \(\lVert C \rVert_\infty \approx 9.93 \times 10^7\)
- \(\lVert A \rVert_\infty \approx 2\)
- scale ratio \(\lVert C \rVert_\infty / \lVert A \rVert_\infty \approx 4.96 \times 10^7\)
- \(\operatorname{condest}(C^TC) \approx 6.96 \times 10^{15}\)
So this is a mixed-scale problem with a very ill-conditioned normal-equations matrix.
Timing comparison against lsqlin
I benchmarked with MATLAB + OSQP C interface using a local script named qpWork_timing.m (5-run averages).
lsqlin baseline
- Wall time: about 53.96 ms
- Iterations: 8
- \(\lVert Cx-d \rVert_2\): \(8.261228 \times 10^0\)
- \(\max(Ax-b)\): \(-9.411 \times 10^{-9}\)
OSQP (slack form)
- scaling=0, eps=1e-6:
- setup about 4.71 ms, solve about 12.39 ms, total about 17.10 ms
- solved in all runs
- \(\lVert Cx-d \rVert_2 = 8.261167 \times 10^0\)
- \(\max(Ax-b) = 8.480 \times 10^{-7}\)
- scaling=10, eps=1e-6:
- solve about 4357 ms
- max-iteration exit in all runs
- significantly worse residual quality
- scaling=0, eps=1e-8:
- total about 123.01 ms
- solved in all runs
- tighter feasibility and similar fit
- scaling=10, eps=1e-8:
- solve about 8766 ms
- max-iteration exit in all runs
What I think is happening
I do not yet have a complete explanation, but my working hypothesis is:
- OSQP scaling (Ruiz equilibration) is changing the effective ADMM residual balancing in this specific block structure.
- The slack formulation has a large-\(C\) equality block and a small-scale inequality block.
- For this case, that rescaling seems to move ADMM into a slower path, while no scaling preserves a better trajectory.
So scaling is not “bad” in general. It is just not universally beneficial.
Practical heuristic for an lsqlin-like OSQP wrapper
For this problem family, a robust strategy is:
- Use the slack formulation (avoid \(P=C^TC\) when possible).
- Pilot probe:
- run short solves with scaling=0 and scaling=10 (for example 500 iterations, eps=1e-6)
- pick the branch with better feasibility score and lower time
- Full solve with chosen branch:
- adaptive_rho=true
- scaled_termination=false
- polishing=true
- eps=1e-6 for speed, eps=1e-8 for tighter feasibility
This is the direction I plan to use in an lsqlin-like interface built on top of OSQP.
Takeaway
The surprising result here is simple: for this dataset, ADMM works much better with OSQP scaling off.
It was not intuitive to me either, but the benchmark is clear and repeatable.