I want to show you now another formatting example taken from a recent post in the BioPerl's mailing list. This one will include how to parse a CSV file, a very common taks in bioinformatics programming. The question is about consolidating a FASTA file from a source FASTA and a CSV file containing complementing and corresponding identifiers. First as before, let's use two dumb files: The DNANumbers-Sequences.fasta file
Another little thing to take into account, you are responsible to specify the delimiter of your CSV file, this will be the case until someone implements a pattern recognition algorithm for CSV files.
>2863
AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC
>2864
AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT
ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT2863 Gelidium
2864 Poa| multiFasta hashTable |
multiFasta := BioParser parseMultiFasta: ( BioFASTAFile on: 'DNANumbers-Sequences.fasta') contents.
hashTable := BioParser
    tokenizeCSV: ( BioCSVFile on: 'DNANumbers-TaxaNames.csv' ) contents
    delimiter: Character space.
( multiFasta renameFromDictionary: hashTable ) outputToFile: 'Renamed-Sequences.fa'.Another little thing to take into account, you are responsible to specify the delimiter of your CSV file, this will be the case until someone implements a pattern recognition algorithm for CSV files.
 
 
 
0 comments:
Post a Comment