# Caret:Documentation:Statistics:TFCE Implementation

(Difference between revisions)
 Revision as of 20:56, 13 August 2009 (view source)Tim (Talk | contribs)← Older edit Revision as of 04:16, 15 June 2011 (view source)Tim (Talk | contribs) (wording, rephrasing)Newer edit → Line 9: Line 9: In this integral, $e(h, p)$ is the extent (area) of the cluster containing node $p$ at threshold $h$.  If $h$ is higher than the value at node $p$, this value is zero.  Otherwise, it is the sum of the areas of node $p$ and all nodes with a value greater than $h$ that are neighbors of node $p$ or other such nodes.  The values of $E$ and $H$ determine how sensitive the equation is to larger areas or higher values, respectively.  The default values for surfaces are 1.0 for $E$ and 2.0 for $H$. In this integral, $e(h, p)$ is the extent (area) of the cluster containing node $p$ at threshold $h$.  If $h$ is higher than the value at node $p$, this value is zero.  Otherwise, it is the sum of the areas of node $p$ and all nodes with a value greater than $h$ that are neighbors of node $p$ or other such nodes.  The values of $E$ and $H$ determine how sensitive the equation is to larger areas or higher values, respectively.  The default values for surfaces are 1.0 for $E$ and 2.0 for $H$. - Since $e(h, p)$ is not known in advance, the integral can only be approximated by finding the value of $e(h, p)^Eh^H$ for each node at many thresholds.  These values are found at the top and bottom of many "slices" of the values in the metric, and to find the integral of one slice, the value of $e(h, p)^Eh^H$ is found at the top and bottom, they are averaged, and then multiplied by the difference in the h value at the top and bottom of the slice.  This is known as the trapezoidal rule. + Since $e(h, p)$ is not known in advance, the integral can only be approximated by finding the value of $e(h, p)^Eh^H$ for each node at many thresholds.  These thresholds are use to create "slices" of the integral, where each pair of consecutive thresholds forms a slice.  To find the integral of one slice, the value of $e(h, p)^Eh^H$ is computed at the top and bottom of a slice, they are averaged, and then multiplied by the difference in the threshold (h) value between the top and bottom of the slice.  This is known as the trapezoidal rule. == High Level Explanation == == High Level Explanation == - The core concept of our implementation was to grow the clusters from the top (highest values) down, thereby avoiding the need to do many expensive threshold operations.  Instead, nodes are added one at a time to a set of clusters, always adding the node with the highest value which has not yet been added.  If the node is next to one cluster, it is added.  If it is not next to a cluster, it makes its own cluster.  Finally, if it is next to more than one cluster, the clusters are merged. + The core concept of our implementation was to grow the clusters from the top (highest values) down, thereby avoiding the need to do many expensive threshold operations.  Instead, nodes are added one at a time to a set of clusters, always adding the node with the highest value which has not yet been considered.  If the node is on the border of a cluster, it is added to that cluster.  If it is not next to a cluster, it makes its own cluster.  Finally, if it is next to more than one cluster, the clusters are merged. Each cluster is separately broken into "slices" which are computed one at a time, and added to all nodes within a cluster.  When clusters merge, the slices are aligned and all slices from that point on are added to all members of the new, merged cluster, as shown in the image below. Each cluster is separately broken into "slices" which are computed one at a time, and added to all nodes within a cluster.  When clusters merge, the slices are aligned and all slices from that point on are added to all members of the new, merged cluster, as shown in the image below. Line 19: Line 19: [[Image:TFCE_slices.jpg]] [[Image:TFCE_slices.jpg]] - At first, the clusters have separate slices computed regularly (shown as yellow and blue), but just as they merge, a new slice is computed for each of them separately, so that the bottom of the slices computed for the total cluster is flat.  All following slices (green) are then computed for the merged cluster. + At first, the clusters have separate slices computed regularly (shown as yellow and blue), but just as they merge, a new slice is computed for each of them separately, so that the bottom of the slices computed for the clusters align.  All following slices (green) are then computed for the merged cluster. Once this is done for the positive values, the input values are negated, put through the same process, and negated again.  To merge them, if the input value was positive, the positive value is used, otherwise the negative value is used. Once this is done for the positive values, the input values are negated, put through the same process, and negated again.  To merge them, if the input value was positive, the positive value is used, otherwise the negative value is used.

# TFCE Implementation for Surface Metrics

## Equation Explanation

TFCE has two parts, one is the algorithm for finding clusters at different thresholds, and the other is an equation describing how they construct the output: $TFCE(p) = \int_{h_0}^{h_f}e(h, p)^Eh^Hdh$

In this integral, e(h,p) is the extent (area) of the cluster containing node p at threshold h. If h is higher than the value at node p, this value is zero. Otherwise, it is the sum of the areas of node p and all nodes with a value greater than h that are neighbors of node p or other such nodes. The values of E and H determine how sensitive the equation is to larger areas or higher values, respectively. The default values for surfaces are 1.0 for E and 2.0 for H.

Since e(h,p) is not known in advance, the integral can only be approximated by finding the value of e(h,p)EhH for each node at many thresholds. These thresholds are use to create "slices" of the integral, where each pair of consecutive thresholds forms a slice. To find the integral of one slice, the value of e(h,p)EhH is computed at the top and bottom of a slice, they are averaged, and then multiplied by the difference in the threshold (h) value between the top and bottom of the slice. This is known as the trapezoidal rule.

## High Level Explanation

The core concept of our implementation was to grow the clusters from the top (highest values) down, thereby avoiding the need to do many expensive threshold operations. Instead, nodes are added one at a time to a set of clusters, always adding the node with the highest value which has not yet been considered. If the node is on the border of a cluster, it is added to that cluster. If it is not next to a cluster, it makes its own cluster. Finally, if it is next to more than one cluster, the clusters are merged.

Each cluster is separately broken into "slices" which are computed one at a time, and added to all nodes within a cluster. When clusters merge, the slices are aligned and all slices from that point on are added to all members of the new, merged cluster, as shown in the image below.

At first, the clusters have separate slices computed regularly (shown as yellow and blue), but just as they merge, a new slice is computed for each of them separately, so that the bottom of the slices computed for the clusters align. All following slices (green) are then computed for the merged cluster.

Once this is done for the positive values, the input values are negated, put through the same process, and negated again. To merge them, if the input value was positive, the positive value is used, otherwise the negative value is used.

## Programmer's Perspective

Our implementation of TFCE for surface metrics starts by adding all positive value nodes to a max heap, ensuring every time we remove a node, there are no nodes as yet unseen with a greater value. The nodes are then removed one at a time, and considered to see if they are adjacent to a cluster, or should start a new cluster, by examining whether any of its neighbors belong to a cluster. In tracking the growing clusters, each one has a separate previously calculated threshold, signifying the last time it had a slice computed and added to the values of its members. If the node is adjacent to more than one cluster, each cluster immediately has a new slice computed at the current node's value, before merging the clusters into a larger cluster. This way, the cluster area at the "top" and "bottom" of each slice never has a large discontinuity, which could cause inconsistency in the numerical approximation of the integral, the justification being that the clusters do not truly merge until after the current threshold.

After all neighbors are examined, if the current node has a different value than the next node in the heap, and was only adjacent to one cluster, the change in height, delta h, is compared to 2 thresholds: a magnitude for the minimum step to take, in our case, 0.01, and a ratio of the previous threshold, in our case, 0.001. If delta h is smaller than either one of these values, then the slice is not computed, and the previous height is left alone. The ratio step serves to ensure numerical stability using floating points, as the precision of a floating point is approximately proportional to its own magnitude. The magnitude step ensures that as values approach zero, where they become less important, that following the minimum ratio step does not keep it from reaching the end of the computation swiftly. If the next node to be examined has the same value as the current node, a list of clusters that have grown since they were last recomputed is maintained, and the next node in the heap is considered. When the value of the current node is different then the value of the next node, all clusters in the list are then checked according to the delta h cutoffs described above, possibly recomputed, and the list is cleared. Additionally, when all nodes of the specified sign have been added to clusters, each cluster has its final (to zero) slice computed. The entire process is then repeated for the negative values by flipping them to be positive, doing the computation, and flipping them negative again. The output values are then found by taking the value for positive input values from the positive output, and the rest from the negative output.