-
Notifications
You must be signed in to change notification settings - Fork 27
GCFs and GCCs
Once the distance matrix is calculated for the data set, Gene Cluster Family (GCF) assignment is performed for every cutoff distance selected by the --cutoffs
parameter.
The interactive visualization of BiG-SCAPE output will show the one with the largest number.
For every cutoff, BiG-SCAPE creates a network using all distances lower or equal than the current cutoff. The Affinity Propagation clustering algorithm is applied to each subnetwork of connected components that emerge from this procedure. The similarity matrix for Affinity Propagation includes all distances between members of the subnetwork (i.e. it includes those with distance greater than the current cutoff)
Gene Cluster Clan (GCC) setting (enabled by default) will perform a second layer of clustering on the GCFs. For this, Affinity Propagation will be applied again (i.e. on a network of subconnected components) but network nodes are represented by the GCFs defined at the cutoff level specified in the first value of the --clan_cutoff
parameter (Default: 0.3). Clustering will be applied to the network of all GCFs connected by a distance lower or equal than the GCC cutoff (the second value of the --clan_cutoff
parameter; larger distances are discarded. Default: 0.7). Inter-GCF distance is calculated as an average distance between the BGCs within both families.
Affinity propagation parameters used in both clustering layers: damping=0.9
, max_iter=1000
, convergence_iter=200