Large gene expansion in mutualistic polydnaviruses
Gene duplications have been proposed to be the main mechanism involved in genome evolution and in acquisition of new functions. They constitute therefore an important source of innovations and adaptations. Gene duplications are found to be particularly common in mutualistic viruses associated with parasitoid wasps. In these systems the virus is integrated into the wasp genome and virions injected in the parasitoids’ hosts are essential for parasitism success. These viruses encode virulence factors which are involved in host immune suppression and developmental arrest. How did gene family expansion occur and what are the evolutionary forces inducing gene copy divergence? In order to understand gene duplication and divergence mechanisms which occurred during virus-wasp associations, we studied the protein tyrosine phosphatase (PTP) gene family which is the largest virus gene family described in these viruses. Here, we show that viral PTPs expansion occurred through three main mechanisms; by duplication of large genomic segments (segmental duplication) and by tandem and dispersed gene duplications within the viral genome. These duplication events became sources of evolutionary innovations conferring to wasps adaptive properties. Indeed, PTP gene copy evolution was shown to undergo conservative evolution along with episodes of adaptive evolution which were correlated with duplication and wasp speciation processes. Altogether duplications and subsequent gene copy evolution likely contributed to the different patterns of PTP gene regulation and activities observed today. Given the essential role played by the virus in wasp parasitism success and the extraordinary expansion of PTP genes, we can propose that PTP duplication contributed to wasp adaptation and diversification.
INTRODUCTION
Gene duplications have been recognized as an important source of evolutionary innovation and adaptation. The contribution of gene duplication to the evolution of new functional genes has been widely demonstrated in various organisms (Arguello et al., 2006; Katju and Lynch, 2003). Duplicated genes usually can be classified into tandem and dispersed duplicates, and duplicated copies supposedly evolve to improve the ancestral function. Two major questions are addressed when studying gene duplications: what are the molecular mechanisms underlying duplications and how are duplicated copies maintained during evolution? Two main hypotheses have been proposed to explain gene duplication evolution and acquisition of new function. Classical models propose that after gene duplication, one copy evolves under purifying selection and conserves the parental function whereas the extra copy is assumed to be neutral (Force et al., 1999; Hughes, 1994). In contrast, alternative models propose that duplications are adaptive and that duplicated copies diverge under positive selection for acquisition of novel function (Bergthorsson et al., 2007; Des Marais and Rausher, 2008). Here we present a functional gene family encoded by a mutualistic virus involved in a host-parasitoid interaction. This family has been subjected to particularly strong expansion and constitutes therefore a remarkable model to study evolutionary processes involved in gene duplications. The organization of genes into families constitutes a common characteristic in viruses of the polydnaviridae family associated with parasitoid wasps. The gene families encode putative virulence factors, some of which were proven to disrupt lepidopteran host physiology (Desjardins et al., 2007; Espagne et al., 2004; Lapointe et al., 2007; Webb et al., 2006). In waspPDV associations, PDVs persist as stably integrated proviruses in the genome of their associated wasp (Desjardins et al., 2007 ; Bezier 2008) and replicate in female ovaries only. Virus particles are injected into the lepidopteran host during wasp oviposition at the same time as wasp eggs. PDVs do not replicate in the parasitized host insect, but viral gene products suppress the host immune system and cause physiological alterations ensuring parasitoid development (Asgari et al., 1996; Beckage and Gelman, 2004; Tanaka et al., 2000). This unique example of mutualism between a virus and an eukaryotic organism, constitutes a real evolutionary success in regards to the tens of thousands of parasitoid species which carry PDVs. All wasps carrying PDV are found within the two separate lineages which constitute the Ichneumonoidea wasp superfamily: Ichnoviruses (IV) are associated with Ichneumonid wasps and Bracoviruses (BV) are found in Braconid wasps (Turnbull and Webb, 2002). Recently, PDVs associated with the Banchinae Chapitre III 38 wasps, previously belonging to the Ichnovirus genera, have been shown to be sufficiently distinct from both Ichnoviruses and Bracoviruses to justify the creation of a third PDV group (Lapointe et al., 2007). Each PDV genera presents distinct morphological and packaging characteristics (Lapointe et al., 2007; Webb et al., 2006) and the absence of PDVs in the basal Ichneumonoids suggests that IV, BV and Banchinae viruses arose independently in the three wasp lineages (Lapointe et al., 2007; Whitfield, 2002). To date, several PDV genomes have been sequenced and have all been shown to have large genomes segmented in multiple dsDNA circles. The other common and original feature of PDVs is that putative genes encoding virulence factors are organized in multigene families (Desjardins et al., 2007; Espagne et al., 2004; Lapointe et al., 2007; Webb et al., 2006). The diversification of virulence genes into families may reflect the adaptive pressures imposed on PDV genome evolution and underline their role in wasp parasitism. The Braconid wasps carrying PDVs form a monophyletic group called the Microgastroid complex which was estimated, thanks to the calibration of the molecular clock by fossil records, to have arisen 103 Mya ago from an unique BV-braconid ancestral association (Murphy et al., 2008). Some genes encoding IκBs and protein tyrosine phophatases (PTP) respectively are common to most sequenced bracoviruses, suggesting they were acquired early in the course of the wasp-bracovirus evolution. IκB genes found in the three PDV lineages, encode proteins which are inhibitors of nuclear transcriptional factors involved in vertebrate and in Drosophila immune responses (De Gregorio et al., 2001; Falabella et al., 2007; Hoffmann, 2003; Thoetkiattikul et al., 2005). PTP genes are not found in IV and form a distinct clade in Banchinae, the lack of evidence of a common ancestor between PTPs from Banchinae and BV suggests that PTPs evolved separately in these two virus lineages (Lapointe et al., 2007). In all Bracovirus genomes described so far, PTPs belong to the largest gene family with 27 members in Cotesia congregata Bracovirus (CcBV), 13 members in Microplitis demolitor Bracovirus (MdBV) and at least 9 members in Glyptapanteles indiensis Bracovirus (GiBV) genome, which is partly sequenced (Desjardins et al., 2007; Espagne et al., 2004; Webb et al., 2006). PTP genes are known to play a key role in the control of signal transduction pathways by dephosphorylating tyrosine residues on regulatory proteins (Andersen et al., 2001). All PDV PTPs studied so far are expressed in virus infected hosts but only a subset of these genes encodes catalytically functional PTPs (Ibrahim and Kim, 2008; Provost et al., 2004; Pruijssers and Strand, 2007). The “inactive” PTPs have been suggested to play a role in trapping phosphorylated proteins to impair cellular PTP activity in a competitive way (Provost et al., 2004). Moreover PTP gene expression is regulated in a tissue specific and time dependant manner (Gundersen-Rindal and Pedroni, 2006; Ibrahim et al., 2007; Chapitre III 39 Provost et al., 2004; Pruijssers and Strand, 2007). Bracovirus PTPs appear therefore to be important virulence factors which have undergone a high expansion rate and a high functional divergence (Bézier et al., 2007). In this context, Bracovirus PTPs emerge as an interesting gene family model to study the mechanistics and the evolution of gene duplication. To date, complete or partial sequence data for Bracovirus genomes are available giving us a support for understanding the genomic organization and the transmission of duplications in a dynamic interaction between a parasitoid wasp, a virus and a lepidopteran host. Studying PTP gene family evolution enabled us to determine the molecular and evolutionary mechanisms at the origin of this family, and to highlight viral genome plasticity and evolutionary forces at the basis of PTP diversity. We discuss the critical role of duplications and natural selection in regard to functional divergence and adaptation.
Materials and Methods
Wasp specimens Fourteen PTP genes previously isolated from Bracoviruses associated with Braconid wasps have been studied: PTP P, Q, Y, K, L, C, α, S, M, E, X, H, R and ∆. These PTP genes were isolated from nine Cotesia species were considered: C. congregata (laboratory reared, France, Drezen,J-M), C. chilonis (laboratory reared, USA, Wiedenmann,. R), C. flavipes (Field collected, Kenya, S. Dupas), C. glomerata (laboratory reared, Netherland, Vet,L), C. melanoscela (Field collected, France, C. Villemant), C. marginiventris (laboratory reared, USA, Joyce,A), C. vestalis (Field collected, Benin, Guilloux,T), C. rubecula (laboratory reared, Netherland, Smid,H), C. sesamiae (Field collected, Kenya, S. Dupas). All specimens were placed in 95% ethanol and preserved at –20°C until DNA was extracted. DNA extraction, amplification and sequencing DNA was extracted with the « chelex » method from 2 individuals for each species except for C. rubecula where 1 individual was used. Briefly, individuals were ground in a 5 % chelex 100 resin (Biorad) solution with proteinase K (0.12 mg/ml) and incubated at 56°C for 30 min, then incubated at 95°C for 15 min and supernatants were collected. The primers were designed Chapitre III 40 according to the sequences of the CcBV PTPs: each pair of primers flank motifs 1 to 10 that characterize PTPs and are specific for each PTP. Primer sequences are listed in table 1. PCR conditions varied in stringency depending on whether amplified PTP genes belonged to closely related species (55°C annealing temperature and 1,5 mM of MgCl2) or to distantly related species (annealing at 45°C and 3 mM of MgCl2) of CcBV. One µl of DNA was used for each PCR reaction. The standard PCR program was : 95°C for 2,5 min ; 30 cycles at 95°C for 30 sec, annealing for 45 sec, 72°C for 60 sec ; 72°C for 5 min. The amplimers were purified with the Qiaquick kit (Qiagen) and sequenced directly. For most PTP genes direct sequencing was possible and sequence profiles did not show multiple peaks suggesting allele mix. For PTP Cα and PTP EX, direct sequencing was not possible therefore cloning of PCR products (Qiagen cloning kit) was performed and 10 clones for each gene were sequenced. The sequencing reactions were performed with the BigDye Terminator Sequencing Kit (Perkin Elmer ABI) and analysed on an ABI PRISM 3100 Genetic Analyzer.
PTP duplication patterns
In order to understand molecular mechanisms which induced the high PTP gene numbers we searched for homologous PTP genes in CcBV, CvBV, GiBV and MdBV. Thirtyseven PTP protein sequences from CvBV, 9 protein sequences from GiBV, 13 protein sequences from MdBV and 27 PTP protein sequences from CcBV were blasted and genes were considered to be orthologous when identities were higher than 80%. PTP genes were mapped within virus genomes by using gene positions in Genbank. Sequence analysis and phylogeny We isolated 87 PTP genes from Cotesia bracoviruses and other orthologous PTP sequences from CcBV, CvBV, GiBV and MdBV PDV genome sequencing projects were joined to the analysis (Espagne et al., 2004; Gundersen-Rindal and Pedroni, 2006; Webb et al., 2006). (CcBV PTP K, L, Q, P, M: AJ632304; CcBV PTP S, E,C:AJ632313; CcBV PTP α, X, Y: AJ632319; CcBV PTPH: AJ632307, CcBV PTPR: AJ632310; CvBV PTP2: AY871265; CvBV PTP3: AY651829; CvBVPTP6: AY651829; CvBV PTP10: AY651828; CvBV PTP11:DQ075354; GiBV PTP4: AY871265; GiBVPTP3:AY871265) Translated sequences were aligned manually in McClade version 4.03 (Maddison, 2001) based on the PTP conserved motifs (Andersen et al., 2001). This sequence alignment was used to construct a tree in order to have an overview of PTP evolution in Microgastrinae. Using Modeltest version 2.2 (Posada and Crandall, 1998), the GTR+I+G model of sequence evolution was selected according to the likelihood ratio test (LRT) and the Akaike information criterion (AIC). Bayesian MCMC analyses were performed for the entire data set using MrBayes version 3.12 (Ronquist and Huelsenbeck, 2003). Two independent analysis were run simultaneously for each data set, each consisting of 1000000 generations, sampling every 1000 generations and using four chains and uniform priors. Maximum parsimony (MP) analyses were performed using a heuristic search with stepwise random addition sequence. Support values for internal nodes were estimated using a non-parametric bootstrap resampling procedure after 100 replicates. PTP genes studied could be separated in 3 monophyletic subclades that shared conserved motifs. Each subclade was studied independently, the tree topologies were obtained using maximum parsimony (MP) in PAUP 4.0b10 (Swofford, 2002) and Bayesian inference in MrBayes 3.12 (Ronquist and Huelsenbeck, 2003). For MP analysis, we performed a heuristic search starting with stepwise addition trees replicated 10 times and using a simple input order of Chapitre III 42 sequences to get the initial tree. Robustness of MP topologies was assessed by bootstrap with 1000 replicates (full heuristic search) of 10 random stepwise additions. For Bayesian inference, the best substitution model was selected using Modeltest 3.7 (Posada and Crandall, 1998). When this model was not available in MrBayes, the closest generalisation of the selected model was used, which for all clades happened to be the GTR+I+G model. The data were partitioned by codon position. We performed a 1000000 generation run sampled every 1000 generations on 4 incrementally heated chains. The burnin period was estimated by plotting likelihood values against generation time and after 20000 generations all were stabilized.