When you have performed a sequencing project, quality control is one of the first things you will need to do. Unfortunately, sample mix-ups and other issues can and do happen. Systematic biases can also occur by machine and lane.
This script will extracting basic information from a set of FASTQs and output it to summary file (
fastq_summary.txt). This will work with demultiplexed FASTQs generated by Illumina machines that appear in the following format:
- @HWI-EAS209_0006_FC706VJ – Machine name
- 5 – lane
- 58 – tile within flowcell lane
- 5894 – x coordinate of cluster within tile
- 21141 – y coordinate of cluster within tile
- #ATCACG – index
- /1 – member of pair (/1 or /2)
The script below will extract the machine name, lane, index, and pair.