Update for clustering module
- Clusters start with #1 instead of #0.
- Added the functionality to select all the sequences of a cluster, and send them to the alignment window
- Improved the Excel clustering report (title, headers, full length consensus sequence)
- Added tooltips for sequence selection and sorting.
After many new functionalities have been added over the last years, Scaligner’s source code is reviewed in depth to provide a consistent and more user friendly display of the information.
- A new option has been added prevent fixing automatically frameshifts during the processing of sequences. When fixes are allowed (behavior until 3.23), some nucleotides or aminoacids can be removed from the sequence to optimize the alignment to the germline. While this functionality was developed to fix frameshifts in nucleotide sequences, it makes little sense with aminoacid sequences. Therefore, from 3.24, the default behavior is to fix automatically frameshifts for nucleotide sequences only. This behavior can be modified by the user.
- A new option has been added to avoid trimming automatically the sequence. When trimming is allowed (behavior until 3.23), nucleotides or aminoacids before or after a chain are removed automatically if they are not part of the germline. This is essential for raw sequences, but this is not an expected behavior for well-defined sequences. Therefore, from 3.24, the user can select whether trimming is allowed or not.
- Until version 3.23, Scaligner shows the aligned sequence in the sequence window and uses it to export to FASTA. This sequence might contain some deletions for the reasons described above. From now on, the full antibody sequence (from the beginning of V domain until the end of J Domain) is now displayed and exported, and the differences with the aligned sequence are explained.
- Frameshifts and insertions are now counted only when they are found inside the heavy or light chain. They used to be counted between the chains and the linker or the tail, which has become of little use and hides more important information about the sequence, for instance insertions and frameshifts inside the sequence.
- Insertions and frameshifts are now more visible in the alignment window and the Excel summary file (with the “F” flag) and in the sequence window, in which they are displayed before the rest of the information. The user is suggested to reanalyze the sequence, if some aminoacids might have been removed with a previous version of Scaligner.
- The “G” flag to inform of the presence of a mutation in a sequence in the alignment window has been replaced by highlighting mutations on the amino acids themselves when the “Compare to germline” option is selected.
- The display of mutations is more user-friendly in the alignment window and the sequence window. It is also more consistent in both views, as they did not used the same indices until 3.23: the position in the sequence was provided in the sequence window while now the position in the numbering scheme is shown.
- Errors in the numbering of sequences are now more visible. These errors happen usually when an amino acid is found at an undefined position, for instance the IMGT position L60 is not defined in Kabat.
- A post-alignment procedure has been implemented to refine the alignment around IMGT CDR1 and CDR2 for sequences that are quite dissimilar from any germline. In such cases, the alignment algorithm used to add gaps here and there to increase similarity with the germline. From 3.24, the post-alignment procedure groups aminoacids to make the CDR look like an IMGT CDR, even when the alignment has created gaps in the CDR. This procedure is enabled by default, and it can be modified before analyzing new sequences.
- Kabat numbering has been improved in some specific cases. Until version 3.23, Kabat numbering expects CDRs to have all theirs gaps grouped together after the alignment, like they are in IMGT CDRs. This is usually the case, but not for sequences that are quite dissimilar from any germline, in which case the alignment algorithm will try to add gaps here and there to increase similarity with the germline. The implementation has been changed to group all the gaps found in a CDR together, even if it reduces the similarity compared to the germline. This creates alignments that are in better agreement with Kabat’s database.
New functionality has also been added:
- Added Chothia numbering scheme
- Added Martin numbering scheme
- Added clustering module
- Added humanness evaluation with G-score
- New module to analyze and visualize NGS data based on clonotype analysis
- The NGS module can analyze millions of antibody sequences from several animals
- Sequences are processed by Scaligner using a procedure similar to the one described by Trinklein et al (in section “Variable region amplification, sequencing, and clonotype analysis”).
- Clonotypes are stored and displayed in Scaligner.
- Added AHo numbering
- Change default numbering for all users
- Import sequence by sequence manually
- Display similarity and percent identity in Excel export
- Change default sort scoring from Blosum30 to %Identity
- Search region across database
- Search window is no more limited to 20 sequences
- Calculate CDR3 frequency across database
- Choose the order to display VH and VL
- Support for new species (rat, mouse, rabbit)
- Download original sequences for a list or for all the lists in a a folder
- Document: improved support for WebDAV protocol
- handle complementary sequences
- improved sequence detection (handles wildcard for amino acids an nucleotide sequences)
- display PTM location in sequence log
- speed up sequence analysis
- added support for WebDAV protocol
- speedup document upload in DocExplorer