Here we report the permanent draft genome sequences of two Rhodopirellula europaea strains. Strain SH398 (= IFAM 3246 = JCM 17614 = DSM 24039) was isolated by Heinz Schlesner from the Kiel Fjord, Germany (54.3297 N 10.1493 E) ( Schlesner et al., 2004), while strain 6C (= JCM 17608 = DSM 24037) originates from Porto Cesareo, Italy (40.2598 N 17.8905 E) ( Winkelmann and Harder, 2009). Other representatives of this species were also found in the North Sea, in the English Channel and at the Greek coast. The genomic DNA of both strains was extracted using the FastDNA SpinKit
for Soil (MP Biomedicals, Germany), randomly sheared into fragments (“shot gun sequencing”) and transferred into 96 well plates with 24 wells were assigned to each strain. Sequencing was performed with the Roche 454 Titanium pyrosequencing technology. The assembly was done with Newbler v. 2.3. Gene prediction was carried out by using a combination Epigenetics Compound Library of the Metagene (Noguchi et al., 2006) and GSI-IX concentration Glimmer3 (Delcher et al., 2007) software packages. Ribosomal RNA genes were detected by using the RNAmmer 1.2 software (Lagesen et al., 2007) and transfer RNAs by tRNAscan-SE (Lowe and Eddy, 1997). Batch cluster analysis was performed by using the GenDB (version 2.2) system (Meyer et al., 2003). Annotation and data mining were done with the tool JCoast, version 1.7 (Richter et al., 2008) seeking for each coding
region observations from similarity searches against several sequence databases (NCBI-nr, Swiss-Prot, Kegg-Genes, genomesDB) (Richter et al., 2008) and to the protein family database InterPro (Mulder et al., 2005). Predicted protein coding sequences were automatically annotated by the software tool MicHanThi (Quast, 2006). Briefly, the MicHanThi software interferes gene functions based on similarity searches against the NCBI-nr (including Swiss-Prot) and InterPro databases using
fuzzy logic. Particular interesting genes, like sulfatases, were manually evaluated. Both GNE-0877 genome sizes are in the range of previously reported Rhodopirellula baltica strains, with over 7 Mb and 6000 predicted open reading frames each. Pairwise analysis by reciprocal best match BLAST revealed 4700 shared genes between the two strains, with 4168 (6C) and 4376 (SH398) genes, respectively, being shared with the type strain R. baltica SH1T. This high number of shared genes reflects the close relation between the two species as predicted by 16S rDNA and ANI analysis. Compared with each other species introduced in this article series, 997 and 1039 genes, respectively, appeared to be strain specific. The number of open reading frames encoding for sulfatases, the outstanding feature of this genus, was found to be very similar in the genomes of R. europaea and R. baltica strains ( Table 1) ( Wegner et al., 2013).