Geometric Separation by
SinglePass Alternating Thresholding
Abstract.
Modern data is customarily of multimodal nature, and analysis tasks typically require separation into the single components. Although a highly illposed problem, the morphological difference of these components sometimes allow a very precise separation such as, for instance, in neurobiological imaging a separation into spines (pointlike structures) and dendrites (curvilinear structures). Recently, applied harmonic analysis introduced powerful methodologies to achieve this task, exploiting specifically designed representation systems in which the components are sparsely representable, combined with either performing minimization or thresholding on the combined dictionary.
In this paper we provide a thorough theoretical study of the separation of a distributional model situation of point and curvilinear singularities exploiting a surprisingly simple singlepass alternating thresholding method applied to the two complementary frames: wavelets and curvelets. Utilizing the fact that the coefficients are clustered geometrically, thereby exhibiting clustered/geometric sparsity in the chosen frames, we prove that at sufficiently fine scales arbitrarily precise separation is possible. Even more surprising, it turns out that the thresholding index sets converge to the wavefront sets of the point and curvilinear singularities in phase space and that those wavefront sets are perfectly separated by the thresholding procedure. Main ingredients of our analysis are the novel notion of cluster coherence and clustered/geometric sparsity as well as a microlocal analysis viewpoint.
Key words and phrases:
Thresholding. Sparse Representation. Mutual Coherence. Tight Frames. Curvelets, Shearlets, Radial Wavelets. Wavefront Set1. Introduction
Along with the deluge of data we face today, it is not surprising that the complexity of such data is also increasing. One instance of this phenomenon is the occurrence of multiple components, and hence, analyzing such data typically involves a separation step. One most intriguing example comes from neurobiological imaging, where images of neurons from Alzheimer infected brains are studied with the hope to detect specific artifacts of this disease. The prominent parts of images of neurons are spines (pointlike structures) and dendrites (curvelike structures), which require separate analyzes, for instance, counting the number of spines of a particular shape, and determining the thickness of dendrites [31, 34].
From an educated viewpoint, it seems almost impossible to extract two images out of one image; the only possible attack point being the morphological difference of the components. The new paradigm of sparsity, which has lately led to some spectacular successes in solving such underdetermined systems, does provide a powerful means to explore this difference. The main sparsitybased approach towards solving such separation problems consists in carefully selecting two representation systems, each one providing a sparse representation of one of the components and both being incoherent with respect to the other – the encoding of the morphological difference –, followed by a procedure which generates a sparse expansion in the dictionary combining the two representation systems. This intuitively automatically forces the different components into the coefficients of the ‘correct’ representation system.
Browsing through the literature, the two main sparsitybased separation procedures can be identified to be minimization (see, e.g., [2, 15, 16, 17, 19, 20, 21, 22, 23, 27, 36, 37, 38, 40]) and thresholding (see, e.g., [1, 21, 32, 33]). For general papers on minimization techniques we refer to [7, 9, 14, 13, 12] and thresholding to [39] or the reference list in the beautiful survey paper [3]. While minimization has produced very strong theoretical results, thresholding is typically significantly harder to analyze due to its iterative nature. However, thresholding algorithms are in general much faster than minimization, which makes them particularly attractive for the aforementioned neurobiological imaging application due to its large problem size.
In this paper we focus on thresholding as a separation technique for separating point from curvelike structures using radial wavelets and curvelets; in fact, we study the very simple technique of singlepass alternating thresholding, which expands the image in wavelets, thresholds and reconstructs the point part, then expands the residual in curvelets, thresholds and reconstructs the curve part. In this paper we aim for a fundamental mathematical understanding of the precision of separation allowed by this thresholding method. Interestingly, our analysis requires the notions of cluster coherence and clustered/geometrical sparsity, which were introduced in [18] by the author and Donoho in the context of analyzing minimization as a separation methodology.
We find the results in our paper quite surprising in two ways. First, the thresholding procedure we consider is very simple, and researchers on thresholding algorithms might at first sight dismiss such singlepass alternating thresholding methodology. Therefore, it is intriguing to us, that we derive a quite similar perfect separation result (Theorem 1.1) as in our paper [18], where minimization as a separation technique was analyzed. Secondly, to our mind, it is even more surprising that in Theorems 1.2 and 1.3 we derive even more satisfying results by showing that the thresholding index sets converge to the wavefront sets of the point and curvilinear singularities in phase space and that those wavefront sets are perfectly separated by the thresholding procedure. This, we already suspected for minimization to be true. However, we are not aware of any analysis tools strong enough to derive these results for separation by minimization.
1.1. A Geometric Separation Problem
Let us start by defining the following simple but clear model problem of geometric separation (compare also the problem posted in [18]). Consider a ‘pointlike’ object made of point singularities:
(1.1) 
This object is smooth away from the given points . Consider as well a ‘curvelike’ object , a singularity along a closed curve :
(1.2) 
where is the usual Dirac delta function located at . The singularities underlying these two distributions are geometrically quite different, but the exponent is chosen so the energy distribution across scales is similar; if denotes the annular region ,
This choice makes the components comparable as we go to finer scales; the ratio of energies is more or less independent of scale. Separation is challenging at every scale.
Now assume that we observe the ‘Signal’
however, the component distributions and are unknown to us.
Definition 1.1.
As there are two unknowns ( and ) and only one observation (), the problem seems improperly posed. We develop a principled, rational approach which provably solves the problem according to clearly stated standards.
1.2. Two Geometric Frames
We now focus on two overcomplete systems for representing the object :

Radial Wavelets – a tight frame with perfectly isotropic generating elements.

Curvelets – a highly directional tight frame with increasingly anisotropic elements at fine scales.
We pick these because, as is well known, point singularities are coherent in the wavelet frame and curvilinear singularities are coherent in the curvelet frame. In Section 1.5 we discuss other system pairs. For readers not familiar with frame theory, we refer to [10, 8], where terms like ‘tight frame’ – a Parsevallike property – are carefully discussed.
The point and curvelike objects we defined in the previous subsection are realvalued distributions. Hence, for deriving sparse expansions of those, we will consider radial wavelets and curvelets consisting of realvalued functions. So only angles associated with radians will be considered, which later on we will, as is customary, identify with , the real projective line.
We now construct the two selected tight frames as follows. Let be an ‘appropriate’ window function, where in the following we assume that belongs to and is compactly supported on while being the Fourier transform of a wavelet. For instance, suitably scaled LemarièMeyer wavelets possess these properties. We define continuous radial wavelets at scale and spatial position by their Fourier transforms
The wavelet tight frame is then defined as a sampling of on a series of regular lattices , , where , i.e., the radial wavelets at scale and spatial position are given by the Fourier transform
where we let index position and scale.
For the same window function and a ‘bump function’ , we define continuous curvelets at scale , orientation , and spatial position by their Fourier transforms
See [4, 5] for more details. The curvelet tight frame is then (essentially) defined as a sampling of on a series of regular lattices
where is planar rotation by radians, , , , and is anisotropic dilation by , i.e., the curvelets at scale , orientation , and spatial position are given by the Fourier transform
where let index scale, orientation, and scale. (For a precise statement, see [6, Section 4.3, pp. 210211]).
Roughly speaking, the radial wavelets are ‘radial bumps’ with position and scale , while the curvelets live on anisotropic regions of width and length . The wavelets are good at representing point singularities while the curvelets are good at representing curvilinear singularities.
Using the same window , we can construct a family of filters with transfer functions
These filters allow us to decompose a function into pieces with different scales, the piece at subband arises from filtering using :
the Fourier transform is supported in the annulus with inner radius and outer radius . Because of our assumption on , we can reconstruct the original function from these pieces using the formula
We now apply this filtering to our known image , obtaining the truly geometric decomposition
for each scale .
For future use, let denote the collection of indices of wavelets at level , and let denote the indices of curvelets at level .
1.3. Separation via Thresholding
We now consider a simple ‘one–stepthresholding’ method – which we also refer to as ‘single pass alternating thresholding’ method – formalizing the first few steps of a recipe for separation pointed out by Coifman and Wickerhauser [11, Fig. 26(ah)] (cf. also [18]). It is formally specified in Figure 1.
OneStep is a very simple, easily implementable way to approximately decompose the signal into purported pointlike and curvelike parts. Currently popular thresholding algorithms are usually far more complex than OneStep : they apply similar operations multiple times, with stopping rules, threshold adaptation, etc. It therefore may be surprising that this very simple noniterative algorithm, with nonadaptive threshold, also works well. The thresholds are even almost chosen as if the data wouldn’t be composed at all: The first threshold is chosen coarsely below the decay rate of significant wavelet coefficients of the ‘naked’ point singularity ; the second threshold is chosen just slightly below the decay rate of significant curvelet coefficients of the ‘naked’ curvilinear singularity . Notice that we threshold the wavelet component more aggressively; and we refer to Section 2.3 for more precise heuristics on the choice of these two thresholds. It comes as a second surprise that our estimates as well as the framework of geometric separation are strong enough to survive this ‘brutally simple’ thresholding strategy, as it is shown in the following result as well as Theorems 1.2 and 1.3.
For the following result, which will be proven in Section 5.3, we continue to suppose the sequence is known; thus the ideal decomposition into a pointlike and curvelike part would be given by . We apply OneStep , which outputs approximations and to and , respectively.
Theorem 1.1.
Asymptotic Separation via OneStep Thresholding.
It is wellknown that minimization and thresholding are closely connected in various ways. In the past few years it has been frequently found that results on successful minimization subsequently inspired parallel results on thresholding methods. In a particular sense, this happened here as well; after obtaining an asymptotic separation result using minimization (cf. [18]), we found a similar result for this surprisingly simple thresholding procedure. However, even more intriguingly, when performing this thresholding procedure – as opposed to minimization – we are able to even derive much more satisfying results than Theorem 1.1, which we turn our attention to now.
1.4. Wavefront Set Separation
The very simplicity of OneStep makes it possible to analyze delicate phenomena which do not seem analytically tractable for iterative thresholding or even for the minimization problem considered in [18].
The geometric separation model we have been studying is distinguished by the behavior of its singularities. One might hope that the two purported geometric components and , defined by
have exactly the singularities that one expects. To articulate this goal requires the notions of wavefront set and phase space from microlocal analysis, which are reviewed below and in Section 2. Intuitively, phase space is the collection of location/direction pairs and the wavefront set of a distribution is the subset of phase space where exhibits singularities. Point singularities are omnidirectional, while curvilinear singularities point in one direction.
Theorem 1.1 shows that the distributions and can be arbitrarily well approximated by thresholding – a similar result was derived in our companion paper [18] for minimization. However, the most desirable and also rhetorically effective matching condition would be an arbitrarily perfect approximation also of the associated wavefront sets and .
Surprisingly, we derive two results in this direction for OneStep – one on the ‘analysis’ side and the other on the ‘synthesis’ side. The first result shows that the wavefront sets of and can indeed be approximated with arbitrary high precision by the significant thresholding coefficients and . As a measure of distance we employ the nonsymmetric Hausdorffstyle distance , say, in phase space measuring the largest distance from any point of a subset of phase space to the closest corresponding point of a different subset . As a second result, we prove that the wavefront sets of the synthesized objects and coincide with and , respectively. We might interpret this result as recovering and from the composed image , hence in this sense we do not only separate the pointlike structures from the curvelike structures, but even more separate their wavefront sets.
For a precise statement of the aforementioned two results, we require to introduce some notions from microlocal analysis, which will be our main analysis methodology. Phase space is the space of all direction/location pairs , where and the orientational component will be regarded as an element in , the real projective space^{1}^{1}1Here we identify with and freely write one or the other in what follows. It may at first seem more natural to think of directions rather than orientations , note however that in this paper we consider realvalued distributions measured by realvalued curvelets so directions are not resolvable, only orientations. We also frequently abuse notation as follows: we will write when what is actually meant is geodesic distance between two points on . in .
Since radial wavelets are oriented in all directions, we denote the set of significant phase space pairs produced by the wavelet component of algorithm OneStep by
(1.3) 
the set of significant phase space pairs for the curvelet component of OneStep is:
(1.4) 
We further require the notion of a metric in phase space, which we choose to be
and its associated asymmetric distance
Section 6 then proves the following theorem.
Theorem 1.2.
Approximation of the Wavefront Sets.
In short, the significant coefficients in each purported geometric component cluster increasingly around the wavefront set of the underlying ‘true’ geometric component. We further derive the following result (proved in Section 7).
Theorem 1.3.
Separation of the Wavefront Sets.
This implies that the wavefront sets of the reconstructed components are precisely what we might hope for.
1.5. Extensions
We would like to point out that the analysis of OneStep for solving the special separation problem we focus on in this paper, gives rise to very extensive generalizations and extensions; a few examples are stated in the sequel.

More General Classes of Objects. Theorems 1.1–1.2 can be generalized to other situations. First, we could consider singularities of different orders. This would allow to model ‘cartoon’ images, where the curvilinear singularities are now the boundaries of the pieces for piecewise functions. Second, we can allow smooth perturbations, i.e., where are smooth functions of rapid decay at . In this situation, we let the denominator in Theorem 1.1 be simply .

Rate of Convergence. Theorem 1.1 can be accompanied by explicit decay estimates.
2. Microlocal Analysis Viewpoint
The morphological difference between the two structures we intend to extract – points and curve – is the key to separation. In the section we will describe why heuristically this key issue makes separation possible as well as present our main means to choose the ‘correct’ thresholds.
2.1. Point and Curvelike Structures in Phase Space
Our intuition as well as hard analysis is based on a microlocal analysis viewpoint, which through the notion of wavefront sets will allow us to, roughly speaking, include the morphology of the structures by adding a third dimension to spatial domain. Let us start by recalling the notion of wavefront set and – related with this – the notion of singular supports and phase space. The singular support of a distribution , , is defined to be the set of points where is not locally . The notion of wavefront set then goes beyond the classical spatial domain picture and extends it to phase space, which consists of positionorientation pairs ; see the more detailed discussion in Section 1.4. The wavefront set lives in this phase space and can be coarsely described as the set of positionorientation pairs at which is nonsmooth; for more details, see: [25, 5, 29].
To illustrate these notions and also prepare our heuristic argument why separation through thresholding is possible, we first consider the distribution . A short computation shows that
which can be regarded as a manifestation of the isotropic nature of the point singularities. Illustrations of and of are presented in Figure 2.
2.2. Wavelets and Curvelets in Phase Space
Although being smooth functions, in a certain sense, wavelets and curvelets can be regarded as leaving an approximate footprint in phase space. To make this statement rigorous, we first observe the approximate footprint in spatial domain left by wavelet and curvelets as detailed in the following two lemmata taken from [18]. As expected, these observations show the isotropic nature of wavelets in contrast to the anisotropic nature of curvelets.
Lemma 2.1 ([18]).
For each there is a constant so that
Lemma 2.2 ([18]).
For each there is a constant so that
Since it is known from [5] that the continuous curvelet transform precisely resolves the wavefront set of distributions, we might consider the image of wavelets and curvelets under the continuous curvelet transform for ‘sufficiently small’ scale as a footprint of these in phase space. An illustration is given in Figure 4, and for a detailed description we refer the interested reader to [18].
Visually, wavelets are perfectly adapted to strongly react to in a similar way as curvelets will strongly react to . This will be now made precise and will lead to the chosen thresholds for separation.
2.3. Road Map to the ‘Correct’ Thresholds
To slowly approach a rigorous phrasing of the aforementioned strong reaction, we first consider the reaction of both wavelets to and . A simplified form of Lemma 3.2 states that,
(2.1) 
with fast decay^{2}^{2}2As it is custom, we refer to the behavior as for all as fast decay. for other locations than , and Lemma 3.3 shows that, for each and ,
(2.2) 
Secondly, turning our attention to curvelets and their reaction to and , we observe that, by a simplified form of Lemma 4.4, for each ,
(2.3) 
and, for positioned on the curve and pointing in the direction perpendicular to the tangent to the curve in ,
(2.4) 
Examining closely (2.1) and (2.2), it becomes immediately evident that the correct first threshold – which should capture by thresholding wavelet coefficients of – need to be chosen ‘slightly higher’ than a constant asymptotically, wherefore we choose it equal to for small .
The second threshold seem to be a somehow more serious problem, since (2.3) and (2.4) show that curvelets react stronger to a point singularity than a curvilinear singularity. However, we wish the reader to keep in mind that ideally all energy from is already captured during the first thresholding procedure. Hence, it should presumably be ‘safe’ to choose the second threshold – which shall capture by thresholding curvelet coefficients of the residual – with asymptotic behavior . To avoid unnecessary risks, we choose it only slightly below , more precisely, equal to .
2.4. What Type of Separation Result is Preferable?
Applying now OneStep (cf. Figure 1) yields significant coefficient sets and and approximations to and : and . In Sections 1.3 and 1.4, we presented three theorems on the ‘quality’ of this separation, which we would now like to discuss and compare.
Theorem 1.1 studies the relative separation error and proves that asymptotically this error can be made arbitrarily small for sufficiently fine scale. This is in a sense the most natural question to ask, and the theorem provides the answer one would hope for.
However, from a microlocal analysis viewpoint, the most satisfying separation to derive would be the perfect separation of the wavefront sets of and , i.e., to separate the LHS of Figure 4 into the RHS of Figures 2 and 3. This would be considerably ‘stronger’ than Theorem 1.1 in the following sense: Once the wavefront sets are extracted, we have complete information about the underlying singularities, in contrast to the merely asymptotic knowledge provided by Theorem 1.1.
Knowledge about and could be either coming from and or from and . The sets of significant coefficients generated by thresholding do not provide an immediate means for separating the wavefront sets, since they live on the analysis side (as opposed to the synthesis side). Astonishingly, they are still able to precisely locate the wavefront sets and , more precisely, they ‘converge’ to the wavefront sets in phase space measured in the phase space norm as , which is the statement of Theorem 1.2. This shows that the points in and are located in tubes around and , respectively, which become more concentrated around these wavefront sets as the scale becomes finer. The corresponding objects on the synthesis side, i.e., and , now allow separation of and , in the sense that the wavefront sets of the reconstructed distributions and precisely coincide with and . This is the content of Theorem 1.3.
3. Geometry of the Thresholded Wavelet Coefficients
Following the ordering of the thresholding, we first focus on the set of significant radial wavelet coefficients generated by Step 1) of OneStepThresholding (see Figure 1), in particular, on its phase space footprint, defined in (1.3) as
Our objective will be to derive a tube around in phase space with controllable ‘size’. This tube should therefore be a neighborhood of , and hence be isotropic.
For our analysis, we first notice that WLOG we can assume that
(3.1) 
From here, the result for the original as defined in (1.1) can be concluded because of the following reasons: Firstly, all results are translation invariant, hence instead of the origin the results follow immediately for a different point in spatial domain; and secondly, the change from one point to finitely many points just introduces a constant independent on .
3.1. Estimates for Wavelet Coefficients
We start by analyzing the interaction of wavelet atoms. The technical proof of the following result is provided in Section 8.1
Lemma 3.1.
For each , there is a constant so that
Next, we recall a result derived in [18] for radial wavelet coefficients of our point singularity (3.1).
Lemma 3.2 ([18]).
For each , there is a constant so that
In the sequel, we will further require an estimate of the wavelet coefficients of the curvilinear singularity . Notice that the following estimate does only provide a very coarse upper bound. In order to derive a more detailed estimate, the curve would need to be much more carefully analyzed as it will be done in Section 4. However, the estimate as stated below is all we will require.
Lemma 3.3.
There exists a constant so that
Proof. By Lemma 2.1 and the definition of the distribution ,
(3.2) 
WLOG we assume that with , say, and we can also assume that . Choosing a ball around with chosen arbitrarily small (yet, independent of ), there exists some such that
This information is now used to split the last integral in (3.2) according to
(3.3) 
For estimating , we first observe that it is sufficient to consider due to symmetry reasons. For small enough, the curve inside can be arbitrarily well approximated by its osculating circle with its center denoted by . Combining these considerations as well as exploiting the approximation by a Taylor series for cosine,
(3.4)  
Using the definition of , the integral can be easily estimated as
(3.5) 
Summarizing, by (3.2)–(3.5), there exists some constant (independent on and ) such that
The lemma is proved. ∎
3.2. Geometry of
We now first analyze the set by the following two lemmata.
Lemma 3.4.
Let , and let be sufficiently large. Then, for each , there is a constant so that
Proof. Let be such that
Since by Lemma 3.3, , and hence , is bounded by a constant , say, we have
Next we use the estimate in Lemma 3.2 as a model to conclude that
Thus, since for sufficiently large , we have ,
Letting be large enough so that proves the lemma. ∎
Lemma 3.5.
Let . Then, for each , there is a constant so that
Proof. Let be such that
Since by Lemma 3.3, , and hence , is bounded by a constant , say, we have
Next we use the estimate in Lemma 3.2 as a model to conclude that
Thus, since for sufficiently large , we have ,
The lemma is proved. ∎
We certainly hope (and expect) that the threshold is set in such a way that the wavefront set of is contained in . This is obviously the first requirement for being able to separate both wavefront sets and through SinglePass Alternating Thresholding (compare Theorem 1.3). The next result shows that this is indeed the case.
Lemma 3.6.
For sufficiently large,
Hence, in particular,
Proof. By Parseval,
Hence, we can conclude that, for sufficiently large,
This proves the first claim.
For the ‘in particular’part, recall that . By Lemma 3.3 and the previous consideration, for sufficiently large ,
The lemma is proved. ∎
4. Geometry of the Thresholded Curvelet Coefficients
This section now aims to derive a fundamental geometric understanding of the cluster of curvelet coefficients generated by Step 3) of OneStepThresholding of the residual generated in Step 2) (see Figure 1). The phase space geometry will play an essential role in setting up the analysis correctly, hence it will be beneficial to study the projection of onto phase space, defined in (1.4), as
Morally, the points in phase space associated with significant curvelet coefficients, given by , are contained in a tube around in phase space. The main objective will now be to explicitly define such a tube around the phase space footprint of this cluster, where we have more control on. This will become crucial for handling the thresholded curvelet coefficients in the proofs of Theorems 1.1–1.3.
4.1. Bending the Curve
We first face the problem of how to deal with the curvilinear singularity. In [18], this problem was tackled by carefully and smoothly breaking the curve into pieces, bending each piece, and then combining pieces in the end. This technique shall also be applied here. For the convenience of the reader, we review the main ideas of this particular approach.
First, a quantitative ‘tubular neighborhood theorem’ is being developed to allow local bending of the curve. Due to regularity of the curve, there exists some small compared to the curvature of , so that
Now consider the following local coordinate system in the vicinity of . Let , for with , since is closed. Then we have the following
Lemma 4.1 ([18]).
(Tubular Neighborhood Theorem) For sufficiently small , there is some so that, for , we have:

for each , there exists a tube around and an associated diffeomorphism ,

the mapping extends to a diffeomorphism from to which reduces to the identity outside a compact set.
Thus, the set is a tubular neighborhood of on which we have nice local coordinate systems, see Figure 5. This will allow us to locally bend the curve . From now on, always denotes the extended diffeomorphism from to .
Next, choose a function supported in satisfying
(4.1) 
and
Define a smooth partition of unity of using by
and accordingly the distributions
the partition of unity property giving .
Now consider the action of on the distribution
This action induces a linear transformation on the space of curvelet coefficients. With the curvelet coefficients of and the curvelet coefficients of , we obtain a linear operator
It is by now wellknown that diffeomorphisms preserve sparsity of frame coefficients when the frame is based on parabolic scaling (as with curvelets and shearlets), e.g., by [35] (see also [6, Theorem 6.1, page 219]), for any ,
After having carefully bended the curve pieces, we can also reserve the process and glue them together. Choosing, e.g., , from the decomposition we have
This decomposition allows us to relate sparsity of coefficients of the linear singularity to those of the curvilinear singularity:
(4.2) 
Finally, we define the very special distribution we will consider, supported on a line segment by
Then we can write
where
Thus the action of on a continuous function is given by
Conceptually, is a straight curve fragment, which the approach taken in [18] reduced the analysis of to.
Concluding this approach enables us to consider curvelet coefficients of a linear singularity instead of a curvilinear singularity with a linear operator mapping one coefficient set onto the other.
4.2. Estimates for Curvelet Coefficients
We start by estimating the interaction of curvelet atoms and the interaction of a curvelet atom with a wavelet atom. These results are proved in [18].
Lemma 4.2 ([18]).
For each , there is a constant so that
Lemma 4.3 ([18]).
For each , there is a constant so that