Gclust: A Parallel Clustering Tool for Microbial Genomic Data
Server Home
Result
Gclust
Sequence file and databases
×
About
The maximum size of uploaded files is 300MB.
>
Load query sequences in FASTA format from your computer
:
Clustering cutoff
- memiden
<int>
Set the value of extended maximal exact match (MEM) idendity or non-extended MEM idendity for clustering, default = 90
:
Algorithm Options
-minlen
<int>
Set the minimum length for exact match, if not set, default = 20
:
-both
<no-args>
Compute forward and reverse complement matches, default = forward
:
yes
no
-nuc
<no-args>
Match only the characters a, c, g, or t
:
yes
no
-sparse
<int>
Set the step of sparse suffix array, default = 1
:
-threads
<int>
Set the number of threads to use, default = 1
:
-chunk
<int>
Set the chunk size for one time clustering, default = 100, where the unit is million base pairs (Mbp)
:
-nchunk
<int>
Set the chunk number loaded one time for remaining genomes alignment, default = 2
:
-loadall
<int>
Load the total genomes one time
:
yes
no
-rebuild
<int>
Rebuild suffix array after clustering into one chunk, default = 1
:
yes
no
×
About
The typical step of sparse suffix array is 1 to 4, and it always less than the '-minlen' length.
×
About
To avoid the spurious maches, you need to set the maximal exact matches (MEMs). The smaller the value, the longer the program takes to execute.
×
About
If you add '-both' option, Gclust would generate the reverse complement of sequence, and compute forward and reverse complement matches. Otherwise, it will only compute the forward sequence matches.
×
About
If you add '-nuc' option, Gclust would match only the characters a, c, g, or t.
×
About
Considering the maximum expansion performance of Gclust, we recommend the number of threads is not more than 16.
×
About
For example, '-chunk 100' means that Gclust will cluster the genomes by the chunk of 100 Mbp every time.
×
About
Considering the limitation of computer memory, we recommend to take the default value.
×
About
If you add '-loadall' option, all the genomes would be loaded into memory.
×
About
If you add '-rebuild' option, the suffix array would be rebuilt after clustering into one chunk.
Extension options of MEM
-ext
<int>
Set the extension type of MEM, where '0' means no extension, '1' means gapped extension and '2' means un-gapped extension, default = 1
:
0
1
2
-mas
<int>
Set the reward value for a nucleotide match, default = 1
:
-umas
<int>
Set the penalty value for a nucleotide mismatch, default = -1
:
-gapo
<int>
Set the cost value to open a gap, default = -1
:
-gape
<int>
Set the cost value to extend a gap, default = -1
:
-drops
<int>
Set the X dropoff value for extension, default = 1
:
×
About
Note here that '-ext' must be used with the '-loadall' option.
×
About
Here we recommend to take the default value.
×
About
Here we recommend to take the default value.
×
About
Here we recommend to take the default value.
×
About
Here we recommend to take the default value.
×
About
Here we recommend to take the default value.
Mail address for job checking
Give your mail address:
×
About
A link to the execution status of your submitted job will be sent to the email you provided.
Submit
Reset