Sunday, July 15, 2012

Using the BioEntrezClient

Sometimes you have a list of accession numbers and want to get the corresponding FASTA sequences, this is the way to do it:

| result |
result := BioEntrezClient new nuccore
                uids: #('AB177765.1' 'AB177791.1');

result outputToFile: 'nuccore_seqs.fasta'

the script will download the sequences in one trip vía the NCBI Entrez API, if you just wanted the GenBank format, just set #setGb instead of #setFasta above.The default format is ASN.1, which is a "hard" format for bioinformaticians. To download PubMed records from UID's, the following is a simple possible script:

| result |

result := 
 BioEntrezClient new pubmed 
  uids: #( 11877539 11822933 );
result outputToFile: 'fetchPubmed-01.txt'
And pretty much the same way with the spelling service using PubMedCentral database

| result |
result := 
 BioEntrezClient new pmc 
  term: 'fiberblast cell grwth';
result outputToFile: 'eSpell1.xml'

With some classes you would want to view the possible messages you may send, for example to get the list of databases through Entrez. In this case this is easily done with the reflection capabilities of Smalltalk:

BioEntrezClient organization
   listAtCategoryNamed: 'accessing public - databases'

1 comment:

  1. Thank you very much for sharing such a useful article. Will definitely saved and revisit your site