Friday, August 26, 2016

A ShapeIt2 wrapper is available

Introduction

One of the latest additions in BioSmalltalk is a wrapper for running the well-known ShapeIt2 software (actually is ShapeIt v2). ShapeIt is a fast and accurate method for estimation of haplotypes (a.k.a. phasing) from a set of SNP genotypes (.ped format or its .bed/.bim/.fam binary version) and a genetic map (.map format), and produces as output, either a single set of estimated haplotypes, or a haplotype graph that encapsulates the uncertainty about the underlying haplotypes. The software is currently only available in Unix-like OS.

Usage

To use the wrapper the program binary must be in the system PATH environment variable and all input files, being binarized PLINK (bed, bim, fam) or textual PLINK (ped, map) must share the same name. The following expression launches ShapeIt2 from BioSmalltalk, setting several parameters such as:
  • The number of burn-in MCMC iterations
  • The input file name (without extension),
  • The output file name for the best haplotypes
  • The number of threads to use the multi-threading capabilities
 BioShapeIt2WrapperR727 new
  burn: 10;
  inputBinarized: 'input_brangus';
  outputMax: 'output_brangus';
  thread: 8;
  execute
If you like to explicitly specify
 BioShapeIt2WrapperR644 new
  burn: 10;
  inputTextual: 'input_brangus.ped';
  inputMap: 'input_brangus.map';
  outputMax: 'output_brangus';
  thread: 8;
  execute

Features

Now the BioShapeIt2Wrapper is a superclass for specialized subclasses, each one representing a particular release of ShapeIt2. When I started the wrapper the binary executable of ShapeIt2 was named "shapeit.v2.r644.linux.x86_64", then I checked "shapeit.v2.r727.linux.x64" was released but cannot be run in CentOS 6.x. So you want to keep older version, and also know which binaries are available (it does not mean they are installed in your system of course):
BioShapeIt2Wrapper releases 
"an OrderedCollection('shapeit.v2.r644.linux.x86_64' 'shapeit.v2.r727.linux.x64')"

Wednesday, August 17, 2016

PhyloclassTalk was used to solve a homicide

PhyloclassTalk, an open-source phylogeographic text-mining system based in BioSmalltalk, was used in veterinary forensics to solve a homicide! The September 2016 issue of Legal Medicine includes an article which fully describes the case in detail. PhyloclassTalk was used to narrow blasted sequences of the species (Canis Familiaris) and extract proper meta-data (Breed names) from NCBI's GenBank. A hand-crafted database of dog breeds was built and integrated into a PhyloclassTalk repository to classify (by breed name) and observe the ones located in Argentina, where the sample of and individual was found in a crime scene. Finally it was also used to build and export the results to Arlequin format. PhyloclassTalk paper is almost completed, meanwhile a beta release of the software can be downloaded from its web site.