The QSeq file format is fully documented in the Illumina pipeline user’s guide (page 163). In brief, the file format is as follows.
Each record includes the following tab-separated fields, in order.
Here are two sample lines from a QSeq file. Order doesn’t matter, and read mates do not have to be in the same file or any particular relative position
CRESSIA 242 1 2204 1453 1918 0 1 .TTAATAAGAATGTCTGTTGTGGCTTAAAA B[[[W][Y[Zccccccccc\cccac_____ 1
CRESSIA 242 1 2204 1490 1921 0 2 ..GTAAAACCCATATATTGAAAACTACAAA BWUTWcXVXXcccc_cccccccccc_cccc 1
The PRQ file format is another line-oriented format. Each record contains a read pair in the following tab-separated fields.
Here is a sample line from a PRQ file, constructed with the QSeq lines above:
CRESSIA_242:1:2204:1453;1918#0 NTTAATAAGAATGTCTGTTGTGGCTTAAAA #<<<8><:<;DDDDDDDDD=DDDBD@@@@@ NNGTAAAACCCATATATTGAAAACTACAAA #8658D9799DDDD@DDDDDDDDDD@DDDD
The Fastq is another text-based file format, quite popular although perhaps only for historical reasons.
As of version 1.8 of Illumina’s Casava software, Illumina is returning to the fastq format (from the qseq format). For this reason we have implemented fastq input in Seal PairReadsQseq.
Seal by default assumes the Illumina-style fastq format (see the Casava v. 1.8 user’s guide p. 41). This Fastq format is defined as a series of records. Each record consists of 4 lines:
In Illumina Fastq files the identifier line (line 1) contains several fields of meta info about the sequence in the following format:
@<Instrument>:<Run Number>:<Flowcell ID>:<Lane>:<Tile>:<X-pos>:<Y-pos>SPACE<Read>:<Is Filtered>:<Control Number>:<Index Sequence>
The meaning of each field is as follows.
The Y-pos and Read fields are separated by a SPACE character, while the rest of the fields are separated by colon characters.
Here is an example of a fastq record from the Casava documentation:
@EAS139:136:FC706VJ:2:5:1000:12850 1:Y:18:ATCACG
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
+
BBBBCCCC?<A?BC?7@@???????DBBA@@@@A@@
The format specified above for the id line of the fastq files has been invented by Illumina and, as far as we know, is only used by Casava. The “standard” fastq format makes no specifications for the id line and gives no means to express meta information about the read. Still, Seal tries to let you work with “plain” fastq files, as long as their id ends with a “/1” or “/2” so that it can extract the read number for the sequence.
Seal will initially try to read a Fastq file as an Illumina file, and then revert to the standard format after the first record that doesn’t match the Illumina format.