Thursday, August 20, 2015

Browsing +1,2 million formal scientific names from the NCBI Taxonomy Database.

Contents of this post does not require to load or install BioSmalltalk or PhyloclassTalk, but uses a plain Pharo image with the FastTable package.

As part of the PhyloclassTalk project I wanted to add a feature to browse all formal scientific names found in the full NCBI taxonomy database. The recently published FastTable package in the pharo mailing-list makes me wonder how well will perform to open a FastTable Morphic window with its contents. You can also download the taxonomy dump list I used for this experiment. I filtered the original file (taxdmp.zip) to remove "noise" (synonyms, authorities). Using a Sony Vaio i3 at 2.40Ghz it takes just 4 seconds, and you get a fully scrollable list, without pagination, without lags. First we open the FastTable widget with a basic benchmark:
Smalltalk garbageCollect.
[ 
| speciesDumpReader speciesDumpList | 
speciesDumpReader := 'scientific_names.dmp' asFileReference readStream.
speciesDumpList := speciesDumpReader contents lines.
FTEasyListMorph new
  extent: 300@550;
  elements: speciesDumpList;
  openInWindow
] timeToRun. 
 "0:00:00:03.968" "0:00:00:04.249" "
Now let's go for a more functional species "browser" by adding a callback to open the Google results for the selected taxa:
| speciesDumpReader speciesDumpList | 
speciesDumpReader := 'scientific_names.dmp' asFileReference readStream.
speciesDumpList := speciesDumpReader contents lines.
FTEasyListMorph new
 header: 'NCBI Taxonomy Database List';
 extent: 300 @ 550;
 elements: speciesDumpList;
 menu: [ : taxaName | 
  MenuMorph new 
   add: ('Open External Browser') 
                        target: NBWin32Shell 
                        selector: #shellBrowse: 
                        argument: 'https://www.google.com/?gws_rd=ssl#q=' , taxaName;
   yourself ];  
 openInWindow
and of course, a screenshot:
I hope to see more of this cool packages coming to the Pharo Smalltalk community. Enjoy!