Journal of Genetics and Genomics

Omics tools for the needle out of haystack?

Zhiwei Cao, Xiu-Jie Wang, Shihua Zhang

2018, 45(7): 343-344. doi: 10.1016/j.jgg.2018.07.007

Abstract (122) HTML PDF (1)

Abstract:

A comparison of next-generation sequencing analysis methods for cancer xenograft samples

Wentao Dai, Jixiang Liu, Quanxue Li, Wei Liu, Yi-Xue Li, Yuan-Yuan Li

2018, 45(7): 345-350. doi: 10.1016/j.jgg.2018.07.001

Abstract (107) HTML PDF (2)

Abstract:
The application of next-generation sequencing (NGS) technology in cancer is influenced by the quality and purity of tissue samples. This issue is especially critical for patient-derived xenograft (PDX) models, which have proven to be by far the best preclinical tool for investigating human tumor biology, because the sensitivity and specificity of NGS analysis in xenograft samples would be compromised by the contamination of mouse DNA and RNA. This definitely affects downstream analyses by causing inaccurate mutation calling and gene expression estimates. The reliability of NGS data analysis for cancer xenograft samples is therefore highly dependent on whether the sequencing reads derived from the xenograft could be distinguished from those originated from the host. That is, each sequence read needs to be accurately assigned to its original species. Here, we review currently available methodologies in this field, including Xenome, Disambiguate, bamcmp and pdxBlacklist, and provide guidelines for users.

MCENet: A database for maize conditional co-expression network and network characterization collaborated with multi-dimensional omics levels

Tian Tian, Qi You, Hengyu Yan, Wenying Xu, Zhen Su

2018, 45(7): 351-360. doi: 10.1016/j.jgg.2018.05.007

Abstract (154) HTML PDF (4)

Abstract:
Maize (Zea mays) is the most widely grown grain crop in the world, playing important roles in agriculture and industry. However, the functions of maize genes remain largely unknown. High-quality genome-wide transcriptome datasets provide important biological knowledge which has been widely and successfully used in plants not only by measuring gene expression levels but also by enabling co-expression analysis for predicting gene functions and modules related to agronomic traits. Recently, thousands of maize transcriptomic data are available across different inbred lines, development stages, tissues, and treatments, or even across different tissue sections and cell lines. Here, we integrated 701 transcriptomic and 108 epigenomic data and studied the different conditional networks with multi-dimensional omics levels. We constructed a searchable, integrative, one-stop online platform, the maize conditional co-expression network (MCENet) platform. MCENet provides 10 global/conditional co-expression networks, 5 network accessional analysis toolkits (i.e., Network Search, Network Remodel, Module Finder, Network Comparison, and Dynamic Expression View) and multiple network functional support toolkits (e.g., motif and module enrichment analysis). We hope that our database might help plant research communities to identify maize functional genes or modules that regulate important agronomic traits. MCENet is publicly accessible at http://bioinformatics.cau.edu.cn/MCENet/.

Characterizing functional consequences of DNA copy number alterations in breast and ovarian tumors by spaceMap

Christopher J. Conley, Umut Ozbek, Pei Wang, Jie Peng

2018, 45(7): 361-371. doi: 10.1016/j.jgg.2018.07.003

Abstract (80) HTML PDF (2)

Abstract:
We propose a novel conditional graphical model — spaceMap — to construct gene regulatory networks from multiple types of high dimensional omic profiles. A motivating application is to characterize the perturbation of DNA copy number alterations (CNAs) on downstream protein levels in tumors. Through a penalized multivariate regression framework, spaceMap jointly models high dimensional protein levels as responses and high dimensional CNAs as predictors. In this setup, spaceMap infers an undirected network among proteins together with a directed network encoding how CNAs perturb the protein network. spaceMap can be applied to learn other types of regulatory relationships from high dimensional molecular profiles, especially those exhibiting hub structures. Simulation studies show spaceMap has greater power in detecting regulatory relationships over competing methods. Additionally, spaceMap includes a network analysis toolkit for biological interpretation of inferred networks. We applies spaceMap to the CNAs, gene expression and proteomics data sets from CPTAC-TCGA breast () and ovarian () cancer studies. Each cancer exhibits disruption of ‘ion transmembrane transport’ and ‘regulation from RNA polymerase II promoter’ by CNA events unique to each cancer. Moreover, using protein levels as a response yields a more functionally-enriched network than using RNA expressions in both cancer types. The network results also help to pinpoint crucial cancer genes and provide insights on the functional consequences of important CNA in breast and ovarian cancers. The R package spaceMap — including vignettes and documentation — is hosted on https://topherconley.github.io/spacemap.

Systematic identification and annotation of multiple-variant compound effects at transcription factor binding sites in human genome

Si-Jin Cheng, Shuai Jiang, Fang-Yuan Shi, Yang Ding, Ge Gao

2018, 45(7): 373-379. doi: 10.1016/j.jgg.2018.05.005

Abstract (131) HTML PDF (7)

Abstract:
Understanding the functional effects of genetic variants is crucial in modern genomics and genetics. Transcription factor binding sites (TFBSs) are one of the most important cis-regulatory elements. While multiple tools have been developed to assess functional effects of genetic variants at TFBSs, they usually assume that each variant works in isolation and neglect the potential “interference” among multiple variants within the same TFBS. In this study, we presented COPE-TFBS (Context-Oriented Predictor for variant Effect on Transcription Factor Binding Site), a novel method that considers sequence context to accurately predict variant effects on TFBSs. We systematically re-analyzed the sequencing data from both the 1000 Genomes Project and the Genotype-Tissue Expression (GTEx) Project via COPE-TFBS, and identified numbers of novel TFBSs, transformed TFBSs and discordantly annotated TFBSs resulting from multiple variants, further highlighting the necessity of sequence context in accurately annotating genetic variants. COPE-TFBS is freely available for academic use at http://cope.cbi.pku.edu.cn/.

PhoPepMass: A database and search tool assisting human phosphorylation peptide identification from mass spectrometry data

Menghuan Zhang, Hui Cui, Lanming Chen, Ying Yu, Michael O. Glocker, Lu Xie

2018, 45(7): 381-388. doi: 10.1016/j.jgg.2018.07.005

Abstract (114) HTML PDF (3)

Abstract:
Protein phosphorylation, one of the most important protein post-translational modifications, is involved in various biological processes, and the identification of phosphorylation peptides (phosphopeptides) and their corresponding phosphorylation sites (phosphosites) will facilitate the understanding of the molecular mechanism and function of phosphorylation. Mass spectrometry (MS) provides a high-throughput technology that enables the identification of large numbers of phosphosites. PhoPepMass is designed to assist human phosphopeptide identification from MS data based on a specific database of phophopeptide masses and a multivariate hypergeometric matching algorithm. It contains 244,915 phosphosites from several public sources. Moreover, the accurate masses of peptides and fragments with phosphosites were calculated. It is the first database that provides a systematic resource for the query of phosphosites on peptides and their corresponding masses. This allows researchers to search certain proteins of which phosphosites have been reported, to browse detailed phosphopeptide and fragment information, to match masses from MS analyses with defined threshold to the corresponding phosphopeptide, and to compare proprietary phosphopeptide discovery results with results from previous studies. Additionally, a database search software is created and a “two-stage search strategy” is suggested to identify phosphopeptides from tandem mass spectra of proteomics data. We expect PhoPepMass to be a useful tool and a source of reference for proteomics researchers. PhoPepMass is available at https://www.scbit.org/phopepmass/index.html.

The DrugPattern tool for drug set enrichment analysis and its prediction for beneficial effects of oxLDL on type 2 diabetes

Chuanbo Huang, Weili Yang, Junpei Wang, Yuan Zhou, Bin Geng, Georgios Kararigas, Jichun Yang, Qinghua Cui

2018, 45(7): 389-397. doi: 10.1016/j.jgg.2018.07.002

Abstract (116) HTML PDF (2)

Abstract:
Enrichment analysis methods, e.g., gene set enrichment analysis, represent one class of important bioinformatical resources for mining patterns in biomedical datasets. However, tools for inferring patterns and rules of a list of drugs are limited. In this study, we developed a web-based tool, DrugPattern, for drug set enrichment analysis. We first collected and curated 7019 drug sets, including indications, adverse reactions, targets, pathways, etc. from public databases. For a list of interested drugs, DrugPattern then evaluates the significance of the enrichment of these drugs in each of the 7019 drug sets. To validate DrugPattern, we employed it for the prediction of the effects of oxidized low-density lipoprotein (oxLDL), a factor expected to be deleterious. We predicted that oxLDL has beneficial effects on some diseases, most of which were supported by evidence in the literature. Because DrugPattern predicted the potential beneficial effects of oxLDL in type 2 diabetes (T2D), animal experiments were then performed to further verify this prediction. As a result, the experimental evidences validated the DrugPattern prediction that oxLDL indeed has beneficial effects on T2D in the case of energy restriction. These data confirmed the prediction accuracy of our approach and revealed unexpected protective roles for oxLDL in various diseases. This study provides a tool to infer patterns and rules in biomedical datasets based on drug set enrichment analysis. DrugPattern is available at http://www.cuilab.cn/drugpattern.

LncPipe: A Nextflow-based pipeline for identification and analysis of long non-coding RNAs from RNA-Seq data