Let's write plain Smalltalk code to download the Human chromosome 22 FASTA from the NCBI servers (about 9,6 Mbytes gzip compressed)
| client fileName fStream |
fileName := 'hs_alt_HuRef_chr22.fa.gz'.
[ client := (FTPClient openOnHostNamed: 'ftp.ncbi.nlm.nih.gov')
loginUser: 'anonymous' password: '';
binary;
changeDirectoryTo: 'genomes/H_sapiens/CHR_22'.
(FileStream newFileNamed: fileName)
binary;
nextPutAll: (client getFileNamed: fileName);
close ]
on: NetworkError, LoginFailedException
do: [ : ex | self error: 'Connection failed' ].
fStream := fileName asFileReference readStream.
(ByteArray streamContents: [ : stream |
FLSerializer serialize: fStream binary contents on: stream ]) storeString.
That seems a lot of typing for a Bioinformatics library and Smalltalk tradition. That's why I wrote a
Genome Downloader class which makes really easy to download the latest build:
BioHSapiensGD new downloadChromosome: 22.
If you don't want the blocking feature, you can easily download in background by setting the process priority:
[ BioHSapiensGD new downloadChromosome: 22 ]
forkAt: Processor userBackgroundPriority
named: 'Downloading Human Chromosome...'.
Results will be downloaded in the directory where the virtual .image and .changes files are located. But why stop at human race? There are subclasses for
Bos Taurus (from the UMD,
Center for Bioinformatics and Computational Biology, University of Maryland, and
The Bovine Genome Sequencing Consortium),
Gallus Gallus (
International Chicken Genome Sequencing Consortium) and
Mus Musculus (Celera Genomics and Genome Reference Consortium) and others can be built by just specializing very few methods. We can just download any available assembled genomes with just one line of code.
Enjoy.
Thanks for sharing, nice post! Post really provice useful information!
ReplyDeleteAn Thái Sơn với website anthaison.vn chuyên sản phẩm máy đưa võng hay máy đưa võng tự động tốt cho bé là địa chỉ bán máy đưa võng giá rẻ tại TP.HCM và giúp bạn tìm máy đưa võng loại nào tốt hiện nay.