NCBI SRA sourced fastq data
In these FASTQ data:
- The quality score identifier (+) is sometimes not a match for the sequence identifier (@).
- The forward and reverse reads may be interlaced and need to be separated into distinct datasets.
- Both may be present in a dataset. Correct the first, then the second, as explained below.
- Format problems of any kind can cause tool failures and/or unexpected results.
- Fix the problems before running any other tools (including FastQC, Fastq Groomer, or other QA tools)
For inconsistent sequence (@) and quality (+) identifiers
- 
    Correct the format by running the tool Replace Text in entire line with these options: - Find pattern: ^\+SRR.+
- Replace with: +
 
- Find pattern: 
Note: If the quality score line is named like “+ERR” instead (or other valid options), modify the pattern search to match.
For interlaced forward and reverse reads
Solution 1 (reads named /1 and /2)
- Use the tool FASTQ de-interlacer on paired end reads
Solution 2 (reads named /1 and /2)
- Create distinct datasets from an interlaced fastq dataset by running the tool Manipulate FASTQ reads on various attributes on the original dataset. It will run twice.
Note: The solution does NOT use the FASTQ Splitter tool. The data to be manipulated are interlaced sequences. This is different in format from data that are joined into a single sequence.
- 
    Use the Manipulate FASTQ settings to produce a dataset that contains the /1reads**Match Reads - Match Reads by Name/Identifier
- Identifier Match Type Regular Expression
- Match by .+/2
 Manipulate Reads - Manipulate Reads by Miscellaneous Actions
- Miscellaneous Manipulation Type Remove Read
 
- Match Reads by 
- 
    Use these Manipulate FASTQ settings to produce a dataset that contains the /2reads**- Exact same settings as above except for this change: Match by .+/1
 
- Exact same settings as above except for this change: Match by 
Solution 3 (reads named /1 and /3)
- Use the same operations as in Solution 2 above, except change the first Manipulate FASTQ query term to be:
- Match by .+/3
Solution 4 (reads named without /N)
- If your data has differently formatted sequence identifiers, the “Match by” expression from Solution 2 above can be modified to suit your identifiers.
Alternative identifiers such as:
@M00946:180:000000000-ANFB2:1:1107:14919:14410 1:N:0:1
@M00946:180:000000000-ANFB2:1:1107:14919:14410 2:N:0:1
