Antiretroviral Therapy Optimisation without Genotype Resistance Testing: A Perspective on Treatment History Based Models

Written by on October 29, 2010 – 7:00 am -

Background

Although genotypic resistance testing (GRT) is recommended to guide combination antiretroviral therapy (cART), funding and/or facilities to perform GRT may not be available in low to middle income countries. Since treatment history (TH) impacts response to subsequent therapy, we investigated a set of statistical learning models to optimise cART in the absence of GRT information.

Methods and Findings

The EuResist database was used to extract 8-week and 24-week treatment change episodes (TCE) with GRT and additional clinical, demographic and TH information. Random Forest (RF) classification was used to predict 8- and 24-week success, defined as undetectable HIV-1 RNA, comparing nested models including (i) GRT+TH and (ii) TH without GRT, using multiple cross-validation and area under the receiver operating characteristic curve (AUC). Virological success was achieved in 68.2% and 68.0% of TCE at 8- and 24-weeks (n = 2,831 and 2,579), respectively. RF (i) and (ii) showed comparable performances, with an average (st.dev.) AUC 0.77 (0.031) vs. 0.757 (0.035) at 8-weeks, 0.834 (0.027) vs. 0.821 (0.025) at 24-weeks. Sensitivity analyses, carried out on a data subset that included antiretroviral regimens commonly used in low to middle income countries, confirmed our findings. Training on subtype B and validation on non-B isolates resulted in a decline of performance for models (i) and (ii).

Conclusions

Treatment history-based RF prediction models are comparable to GRT-based for classification of virological outcome. These results may be relevant for therapy optimisation in areas where availability of GRT is limited. Further investigations are required in order to account for different demographics, subtypes and different therapy switching strategies.


Tags: , ,
Posted in я Computatioanl biology | Comments Off

MicroRNA-Integrated and Network-Embedded Gene Selection with Diffusion Distance

Written by on October 29, 2010 – 7:00 am -

Gene network information has been used to improve gene selection in microarray-based studies by selecting marker genes based both on their expression and the coordinate expression of genes within their gene network under a given condition. Here we propose a new network-embedded gene selection model. In this model, we first address the limitations of microarray data. Microarray data, although widely used for gene selection, measures only mRNA abundance, which does not always reflect the ultimate gene phenotype, since it does not account for post-transcriptional effects. To overcome this important (critical in certain cases) but ignored-in-almost-all-existing-studies limitation, we design a new strategy to integrate together microarray data with the information of microRNA, the major post-transcriptional regulatory factor. We also handle the challenges led by gene collaboration mechanism. To incorporate the biological facts that genes without direct interactions may work closely due to signal transduction and that two genes may be functionally connected through multi paths, we adopt the concept of diffusion distance. This concept permits us to simulate biological signal propagation and therefore to estimate the collaboration probability for all gene pairs, directly or indirectly-connected, according to multi paths connecting them. We demonstrate, using type 2 diabetes (DM2) as an example, that the proposed strategies can enhance the identification of functional gene partners, which is the key issue in a network-embedded gene selection model. More importantly, we show that our gene selection model outperforms related ones. Genes selected by our model 1) have improved classification capability; 2) agree with biological evidence of DM2-association; and 3) are involved in many well-known DM2-associated pathways.


Tags: , ,
Posted in я Computatioanl biology | Comments Off

Systematical Detection of Significant Genes in Microarray Data by Incorporating Gene Interaction Relationship in Biological Systems

Written by on October 29, 2010 – 7:00 am -

Many methods, including parametric, nonparametric, and Bayesian methods, have been used for detecting differentially expressed genes based on the assumption that biological systems are linear, which ignores the nonlinear characteristics of most biological systems. More importantly, those methods do not simultaneously consider means, variances, and high moments, resulting in relatively high false positive rate. To overcome the limitations, the SWang test is proposed to determine differentially expressed genes according to the equality of distributions between case and control. Our method not only latently incorporates functional relationships among genes to consider nonlinear biological system but also considers the mean, variance, skewness, and kurtosis of expression profiles simultaneously. To illustrate biological significance of high moments, we construct a nonlinear gene interaction model, demonstrating that skewness and kurtosis could contain useful information of function association among genes in microarrays. Simulations and real microarray results show that false positive rate of SWang is lower than currently popular methods (T-test, F-test, SAM, and Fold-change) with much higher statistical power. Additionally, SWang can uniquely detect significant genes in real microarray data with imperceptible differential expression but higher variety in kurtosis and skewness. Those identified genes were confirmed with previous published literature or RT-PCR experiments performed in our lab.


Tags: , ,
Posted in я Computatioanl biology | Comments Off

Identifying Shared Genetic Structure Patterns among Pacific Northwest Forest Taxa: Insights from Use of Visualization Tools and Computer Simulations

Written by on October 29, 2010 – 7:00 am -

Background

Identifying causal relationships in phylogeographic and landscape genetic investigations is notoriously difficult, but can be facilitated by use of multispecies comparisons.

Methodology/Principal Findings

We used data visualizations to identify common spatial patterns within single lineages of four taxa inhabiting Pacific Northwest forests (northern spotted owl: Strix occidentalis caurina; red tree vole: Arborimus longicaudus; southern torrent salamander: Rhyacotriton variegatus; and western white pine: Pinus monticola). Visualizations suggested that, despite occupying the same geographical region and habitats, species responded differently to prevailing historical processes. S. o. caurina and P. monticola demonstrated directional patterns of spatial genetic structure where genetic distances and diversity were greater in southern versus northern locales. A. longicaudus and R. variegatus displayed opposite patterns where genetic distances were greater in northern versus southern regions. Statistical analyses of directional patterns subsequently confirmed observations from visualizations. Based upon regional climatological history, we hypothesized that observed latitudinal patterns may have been produced by range expansions. Subsequent computer simulations confirmed that directional patterns can be produced by expansion events.

Conclusions/Significance

We discuss phylogeographic hypotheses regarding historical processes that may have produced observed patterns. Inferential methods used here may become increasingly powerful as detailed simulations of organisms and historical scenarios become plausible. We further suggest that inter-specific comparisons of historical patterns take place prior to drawing conclusions regarding effects of current anthropogenic change within landscapes.


Tags: , ,
Posted in я Computatioanl biology | Comments Off

Do Seasons Have an Influence on the Incidence of Depression? The Use of an Internet Search Engine Query Data as a Proxy of Human Affect

Written by on October 28, 2010 – 7:00 am -

Background

Seasonal depression has generated considerable clinical interest in recent years. Despite a common belief that people in higher latitudes are more vulnerable to low mood during the winter, it has never been demonstrated that human's moods are subject to seasonal change on a global scale. The aim of this study was to investigate large-scale seasonal patterns of depression using Internet search query data as a signature and proxy of human affect.

Methodology/Principal Findings

Our study was based on a publicly available search engine database, Google Insights for Search, which provides time series data of weekly search trends from January 1, 2004 to June 30, 2009. We applied an empirical mode decomposition method to isolate seasonal components of health-related search trends of depression in 54 geographic areas worldwide. We identified a seasonal trend of depression that was opposite between the northern and southern hemispheres; this trend was significantly correlated with seasonal oscillations of temperature (USA: r = −0.872, p<0.001; Australia: r = −0.656, p<0.001). Based on analyses of search trends over 54 geological locations worldwide, we found that the degree of correlation between searching for depression and temperature was latitude-dependent (northern hemisphere: r = −0.686; p<0.001; southern hemisphere: r = 0.871; p<0.0001).

Conclusions/Significance

Our findings indicate that Internet searches for depression from people in higher latitudes are more vulnerable to seasonal change, whereas this phenomenon is obscured in tropical areas. This phenomenon exists universally across countries, regardless of language. This study provides novel, Internet-based evidence for the epidemiology of seasonal depression.


Tags: , ,
Posted in ю Computer Science | Comments Off

Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset

Written by on October 28, 2010 – 7:00 am -

Monitoring, assessment and prediction of environmental risks that chemicals pose demand rapid and accurate diagnostic assays. A variety of toxicological effects have been associated with explosive compounds TNT and RDX. One important goal of microarray experiments is to discover novel biomarkers for toxicity evaluation. We have developed an earthworm microarray containing 15,208 unique oligo probes and have used it to profile gene expression in 248 earthworms exposed to TNT, RDX or neither. We assembled a new machine learning pipeline consisting of several well-established feature filtering/selection and classification techniques to analyze the 248-array dataset in order to construct classifier models that can separate earthworm samples into three groups: control, TNT-treated, and RDX-treated. First, a total of 869 genes differentially expressed in response to TNT or RDX exposure were identified using a univariate statistical algorithm of class comparison. Then, decision tree-based algorithms were applied to select a subset of 354 classifier genes, which were ranked by their overall weight of significance. A multiclass support vector machine (MC-SVM) method and an unsupervised K-mean clustering method were applied to independently refine the classifier, producing a smaller subset of 39 and 30 classifier genes, separately, with 11 common genes being potential biomarkers. The combined 58 genes were considered the refined subset and used to build MC-SVM and clustering models with classification accuracy of 83.5% and 56.9%, respectively. This study demonstrates that the machine learning approach can be used to identify and optimize a small subset of classifier/biomarker genes from high dimensional datasets and generate classification models of acceptable precision for multiple classes.


Tags: , ,
Posted in я Computatioanl biology | Comments Off

Cooperation under Indirect Reciprocity and Imitative Trust

Written by on October 27, 2010 – 7:00 am -

Indirect reciprocity, a key concept in behavioral experiments and evolutionary game theory, provides a mechanism that allows reciprocal altruism to emerge in a population of self-regarding individuals even when repeated interactions between pairs of actors are unlikely. Recent empirical evidence show that humans typically follow complex assessment strategies involving both reciprocity and social imitation when making cooperative decisions. However, currently, we have no systematic understanding of how imitation, a mechanism that may also generate negative effects via a process of cumulative advantage, affects cooperation when repeated interactions are unlikely or information about a recipient's reputation is unavailable. Here we extend existing evolutionary models, which use an image score for reputation to track how individuals cooperate by contributing resources, by introducing a new imitative-trust score, which tracks whether actors have been the recipients of cooperation in the past. We show that imitative trust can co-exist with indirect reciprocity mechanisms up to a threshold and then cooperation reverses -revealing the elusive nature of cooperation. Moreover, we find that when information about a recipient's reputation is limited, trusting the action of third parties towards her (i.e. imitating) does favor a higher collective cooperation compared to random-trusting and share-alike mechanisms. We believe these results shed new light on the factors favoring social imitation as an adaptive mechanism in populations of cooperating social actors.


Tags: , ,
Posted in ю Computer Science | Comments Off

How Wealth Accumulation Can Promote Cooperation

Written by on October 27, 2010 – 7:00 am -

Explaining the emergence and stability of cooperation has been a central challenge in biology, economics and sociology. Unfortunately, the mechanisms known to promote it either require elaborate strategies or hold only under restrictive conditions. Here, we report the emergence, survival, and frequent domination of cooperation in a world characterized by selfishness and a strong temptation to defect, when individuals can accumulate wealth. In particular, we study games with local adaptation such as the prisoner's dilemma, to which we add heterogeneity in payoffs. In our model, agents accumulate wealth and invest some of it in their interactions. The larger the investment, the more can potentially be gained or lost, so that present gains affect future payoffs. We find that cooperation survives for a far wider range of parameters than without wealth accumulation and, even more strikingly, that it often dominates defection. This is in stark contrast to the traditional evolutionary prisoner's dilemma in particular, in which cooperation rarely survives and almost never thrives. With the inequality we introduce, on the contrary, cooperators do better than defectors, even without any strategic behavior or exogenously imposed strategies. These results have important consequences for our understanding of the type of social and economic arrangements that are optimal and efficient.


Tags: , ,
Posted in ю Computer Science | Comments Off

A Novel Side-Chain Orientation Dependent Potential Derived from Random-Walk Reference State for Protein Fold Selection and Structure Prediction

Written by on October 27, 2010 – 7:00 am -

Background

An accurate potential function is essential to attack protein folding and structure prediction problems. The key to developing efficient knowledge-based potential functions is to design reference states that can appropriately counteract generic interactions. The reference states of many knowledge-based distance-dependent atomic potential functions were derived from non-interacting particles such as ideal gas, however, which ignored the inherent sequence connectivity and entropic elasticity of proteins.

Methodology

We developed a new pair-wise distance-dependent, atomic statistical potential function (RW), using an ideal random-walk chain as reference state, which was optimized on CASP models and then benchmarked on nine structural decoy sets. Second, we incorporated a new side-chain orientation-dependent energy term into RW (RWplus) and found that the side-chain packing orientation specificity can further improve the decoy recognition ability of the statistical potential.

Significance

RW and RWplus demonstrate a significantly better ability than the best performing pair-wise distance-dependent atomic potential functions in both native and near-native model selections. It has higher energy-RMSD and energy-TM-score correlations compared with other potentials of the same type in real-life structure assembly decoys. When benchmarked with a comprehensive list of publicly available potentials, RW and RWplus shows comparable performance to the state-of-the-art scoring functions, including those combining terms from multiple resources. These data demonstrate the usefulness of random-walk chain as reference states which correctly account for sequence connectivity and entropic elasticity of proteins. It shows potential usefulness in structure recognition and protein folding simulations. The RW and RWplus potentials, as well as the newly generated I-TASSER decoys, are freely available in http://zhanglab.ccmb.med.umich.edu/RW.


Tags: , ,
Posted in я Computatioanl biology | Comments Off

A Comparative Analysis of Gene-Expression Data of Multiple Cancer Types

Written by on October 27, 2010 – 7:00 am -

A comparative study of public gene-expression data of seven types of cancers (breast, colon, kidney, lung, pancreatic, prostate and stomach cancers) was conducted with the aim of deriving marker genes, along with associated pathways, that are either common to multiple types of cancers or specific to individual cancers. The analysis results indicate that (a) each of the seven cancer types can be distinguished from its corresponding control tissue based on the expression patterns of a small number of genes, e.g., 2, 3 or 4; (b) the expression patterns of some genes can distinguish multiple cancer types from their corresponding control tissues, potentially serving as general markers for all or some groups of cancers; (c) the proteins encoded by some of these genes are predicted to be blood secretory, thus providing potential cancer markers in blood; (d) the numbers of differentially expressed genes across different cancer types in comparison with their control tissues correlate well with the five-year survival rates associated with the individual cancers; and (e) some metabolic and signaling pathways are abnormally activated or deactivated across all cancer types, while other pathways are more specific to certain cancers or groups of cancers. The novel findings of this study offer considerable insight into these seven cancer types and have the potential to provide exciting new directions for diagnostic and therapeutic development.


Tags: , ,
Posted in я Computatioanl biology | Comments Off

In Silico Prediction of Human Pathogenicity in the γ-Proteobacteria

Written by on October 27, 2010 – 7:00 am -

Background

Although the majority of bacteria are innocuous or even beneficial for their host, others are highly infectious pathogens that can cause widespread and deadly diseases. When investigating the relationships between bacteria and other living organisms, it is therefore essential to be able to separate pathogenic organisms from non-pathogenic ones. Using traditional experimental methods for this purpose can be very costly and time-consuming, and also uncertain since animal models are not always good predictors for pathogenicity in humans. Bioinformatics-based methods are therefore strongly needed to mine the fast growing number of genome sequences and assess in a rapid and reliable way the pathogenicity of novel bacteria.

Methodology/Principal Findings

We describe a new in silico method for the prediction of bacterial pathogenicity, based on the identification in microbial genomes of features that appear to correlate with virulence. The method does not rely on identifying genes known to be involved in pathogenicity (for instance virulence factors), but rather it inherently builds families of proteins that, irrespective of their function, are consistently present in only one of the two kinds of organisms, pathogens or non-pathogens. Whether a new bacterium carries proteins contained in these families determines its prediction as pathogenic or non-pathogenic. The application of the method on a set of known genomes correctly classified the virulence potential of 86% of the organisms tested. An additional validation on an independent test-set assigned correctly 22 out of 24 bacteria.

Conclusions

The proposed approach was demonstrated to go beyond the species bias imposed by evolutionary relatedness, and performs better than predictors based solely on taxonomy or sequence similarity. A set of protein families that differentiate pathogenic and non-pathogenic strains were identified, including families of yet uncharacterized proteins that are suggested to be involved in bacterial pathogenicity.


Tags: , ,
Posted in я Computatioanl biology | Comments Off

A General Model of Codon Bias Due to GC Mutational Bias

Written by on October 27, 2010 – 7:00 am -

Background

In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns.

Principal Findings

In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively.

Conclusions

The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured.


Tags: , ,
Posted in я Computatioanl biology | Comments Off

Computer-Assisted 3D Kinematic Analysis of All Leg Joints in Walking Insects

Written by on October 26, 2010 – 7:00 am -

High-speed video can provide fine-scaled analysis of animal behavior. However, extracting behavioral data from video sequences is a time-consuming, tedious, subjective task. These issues are exacerbated where accurate behavioral descriptions require analysis of multiple points in three dimensions. We describe a new computer program written to assist a user in simultaneously extracting three-dimensional kinematics of multiple points on each of an insect's six legs. Digital video of a walking cockroach was collected in grayscale at 500 fps from two synchronized, calibrated cameras. We improved the legs' visibility by painting white dots on the joints, similar to techniques used for digitizing human motion. Compared to manual digitization of 26 points on the legs over a single, 8-second bout of walking (or 106,496 individual 3D points), our software achieved approximately 90% of the accuracy with 10% of the labor. Our experimental design reduced the complexity of the tracking problem by tethering the insect and allowing it to walk in place on a lightly oiled glass surface, but in principle, the algorithms implemented are extensible to free walking. Our software is free and open-source, written in the free language Python and including a graphical user interface for configuration and control. We encourage collaborative enhancements to make this tool both better and widely utilized.


Tags: , ,
Posted in ю Computer Science | Comments Off

A Computational Model of Limb Impedance Control Based on Principles of Internal Model Uncertainty

Written by on October 26, 2010 – 7:00 am -

Efficient human motor control is characterized by an extensive use of joint impedance modulation, which is achieved by co-contracting antagonistic muscles in a way that is beneficial to the specific task. While there is much experimental evidence available that the nervous system employs such strategies, no generally-valid computational model of impedance control derived from first principles has been proposed so far. Here we develop a new impedance control model for antagonistic limb systems which is based on a minimization of uncertainties in the internal model predictions. In contrast to previously proposed models, our framework predicts a wide range of impedance control patterns, during stationary and adaptive tasks. This indicates that many well-known impedance control phenomena naturally emerge from the first principles of a stochastic optimization process that minimizes for internal model prediction uncertainties, along with energy and accuracy demands. The insights from this computational model could be used to interpret existing experimental impedance control data from the viewpoint of optimality or could even govern the design of future experiments based on principles of internal model uncertainty.


Tags: , ,
Posted in ю Computer Science | Comments Off

Microbiome Profiling by Illumina Sequencing of Combinatorial Sequence-Tagged PCR Products

Written by on October 26, 2010 – 7:00 am -

We developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end. The number of reads generated permitted saturating or near-saturating analysis of samples of the vaginal microbiome. The large number of reads allowed an in-depth analysis of errors, and we found that PCR-induced errors composed the vast majority of non-organism derived species variants, an observation that has significant implications for sequence clustering of similar high-throughput data. We show that the short reads are sufficient to assign organisms to the genus or species level in most cases. We suggest that this method will be useful for the deep sequencing of any short nucleotide region that is taxonomically informative; these include the V3, V5 regions of the bacterial 16S rRNA genes and the eukaryotic V9 region that is gaining popularity for sampling protist diversity.


Tags: , ,
Posted in я Computatioanl biology | Comments Off
RSS