Thursday, August 20, 2015

Browsing +1,2 million formal scientific names from the NCBI Taxonomy Database.

Contents of this post does not require to load or install BioSmalltalk or PhyloclassTalk, but uses a plain Pharo image with the FastTable package.

As part of the PhyloclassTalk project I wanted to add a feature to browse all formal scientific names found in the full NCBI taxonomy database. The recently published FastTable package in the pharo mailing-list makes me wonder how well will perform to open a FastTable Morphic window with its contents. You can also download the taxonomy dump list I used for this experiment. I filtered the original file (taxdmp.zip) to remove "noise" (synonyms, authorities). Using a Sony Vaio i3 at 2.40Ghz it takes just 4 seconds, and you get a fully scrollable list, without pagination, without lags. First we open the FastTable widget with a basic benchmark:
Smalltalk garbageCollect.
[ 
| speciesDumpReader speciesDumpList | 
speciesDumpReader := 'scientific_names.dmp' asFileReference readStream.
speciesDumpList := speciesDumpReader contents lines.
FTEasyListMorph new
  extent: 300@550;
  elements: speciesDumpList;
  openInWindow
] timeToRun. 
 "0:00:00:03.968" "0:00:00:04.249" "
Now let's go for a more functional species "browser" by adding a callback to open the Google results for the selected taxa:
| speciesDumpReader speciesDumpList | 
speciesDumpReader := 'scientific_names.dmp' asFileReference readStream.
speciesDumpList := speciesDumpReader contents lines.
FTEasyListMorph new
 header: 'NCBI Taxonomy Database List';
 extent: 300 @ 550;
 elements: speciesDumpList;
 menu: [ : taxaName | 
  MenuMorph new 
   add: ('Open External Browser') 
                        target: NBWin32Shell 
                        selector: #shellBrowse: 
                        argument: 'https://www.google.com/?gws_rd=ssl#q=' , taxaName;
   yourself ];  
 openInWindow
and of course, a screenshot:
I hope to see more of this cool packages coming to the Pharo Smalltalk community. Enjoy!

Friday, March 20, 2015

BioSmalltalk now available through GitHub

I have created the BioSmalltalk repository in GitHub so you can clone and contribute from there.

I hope this will make it easy for interested parties to contribute to this code or to specialize it to their own needs. Regular distributions will still be made at Google Code (for now) but if you want the absolute latest changes, GitHub will be the place to go.

If you are interested, please feel free to get involved.

Link: http://github.com/hernanmd/BioSmalltalk

Regards,

Hernán