Expanded details on sampling procedures, sample processing and protocols for DNA extraction, DNA amplification, and chemical analyses can be found in the Supplementary methods and Table B.1.
2.4.2 DNA sequence processing
DNA sequences were processed through a custom pipeline utilising components of UPARSE211 and QIIME212. An initial screening step was performed in mothur213 to remove abnormally short (<275 bp) and long (>345 bp) sequences. Sequences with long homopolymers (>6) were also removed. A total of 47,103,077 reads were quality filtered using USEARCH v7211 with a maximum expected error of 1 % (fastq_maxee=2.5) and truncated from the forward primer to 250 bp. Retained sequences (85.4 % of initial reads) were dereplicated and non-unique sequences removed. Next, reads were clustered to 97 % similarity and chimera checked using the cluster_otus command in USEARCH, and a de novo database was created of representative operational taxonomic units (OTUs). 93.2 % of the original pre-filtered sequences (truncated to 250 bp) mapped to these OTUs, and taxonomy was assigned using the Ribosomal Database Project Classifier214 (with a minimum confidence score of 0.5) against the SILVA 16S rRNA database (123 release, July 2015)215. The final read count was 43,202,089, with a mean of 43,905 reads per sample. Chloroplasts and mitochondrial reads were removed (1.0 and 0.5 % respectively of the final read count) and rarefaction was performed to 9,500 reads per sample. Consequently, 94 samples were then removed from the dataset (this included a set of samples collected temporally), leaving a final number of 925 individual samples (see Supplementary methods). This QC screen also resulted in the removal of one geothermal field from the study (n=17).
2.4.3 Statistical analyses
All statistical analyses and visualisation were performed in the R environment216 using phyloseq217, vegan218 and ggplot2219 packages. Alpha diversity was calculated using the estimate_richness function in phyloseq. A series of filtering criteria were applied to the 46 geochemical parameters measured in this study to identify metadata that significantly correlated with alpha diversity in these spring communities. First, collinear variables (Pearson correlation (|r|>0.7) were detected220. The best-fit linear regression between alpha diversity (using Shannon’s index) and each variable was used to pick a representative from each collinear group. This removed variables associating with the same effect in diversity.
Multiple linear regression was then applied to remaining variables, before and after a
stepwise Akaike information criterion (AIC) model selection was run221. Samples were also binned incrementally by pH (single pH units), temperature (10 °C increments), and geographic ranges (geothermal field; Figure B.1), with non-parametric Kruskal-Wallis (H) testing coefficient performed to identify any significant differences between groups. Finally, correlation of pH and temperature against Shannon diversity was calculated using Pearson’s coefficient |r|.
Bray-Curtis dissimilarity was used for all beta diversity comparisons. For ordination visualisations, a square-root transformation was applied to OTU relative abundances prior to non-metric multidimensional scaling (k=2) using the metaMDS function in the vegan package. ANOSIM (|R|) was used to compare beta diversity across the same pH, temperature and geographic groups (i.e., geothermal fields) used for alpha diversity analyses, followed by pairwise Wilcox testing with Bonferroni correction to highlight significance between individual groups. Linear regression was applied to pairwise geographic distances against spring community dissimilarities to assess the significance of distance-decay patterns. These comparisons were similarly performed on spring communities constrained to each geothermal field. A second series of filtering criteria was applied to geochemical parameters to identify metadata that significantly correlated with beta diversity. Mantel tests were performed between beta diversity and all 46 physicochemical variables using Spearman’s correlation coefficient (ρ) with 999 permutations. In decreasing order of correlation, metadata were added to a PERMANOVA analysis using the adonis function in vegan. Metadata significantly correlating with beta diversity (P<0.01) were assessed for collinearity using Pearson’s coefficient |r|221. In each collinear group (|r|>0.7), the variable with the highest mantel statistic was chosen as the representative. Low variant geochemical variables (SD<0.25 ppm) were then removed to allow a tractable number of explanatory variables for subsequent modelling. Constrained correspondence analysis (using the cca function in vegan) was then applied to OTUs, geothermal field locations and the reduced set of metadata. OTUs were first agglomerated to their respective genera (using the tax_glom function in phyloseq) and then low abundant taxa (<0.7 % of total mean taxon abundance) were removed. Typical geochemical signatures within each geothermal field were used to produce ternary diagrams of Cl-, SO42-, and HCO3- ratios using the ggtern package (v2.1.5)222.
Finally, to detect significant associations between taxa, geochemistry and other metadata (i.e., geothermal field observations), a multivariate linear model was applied to determine log
enrichment of taxa using edgeR223. To simplify the display of taxonomy in this model, we first agglomerated all OTUs to their respective genera or closest assigned taxonomy group (using the tax_glom function in phyloseq), and then only used taxa present in at least 5 % of samples and >0.1 % average relative abundance. Log fold enrichments of taxa were transformed into Z-scores and retained if absolute values were >1.96. Results were visualized using ggtree224. A phylogenetic tree was generated in QIIME by confirming alignment of representative OTU sequences using PyNAST225, filtering the alignment to remove positions which were gaps in every sequence and then building an approximately maximum-likelihood tree using FastTree226 with a midpoint root.
Figure 2.2 - Alpha and beta diversity as a function of pH and temperature. (a) pH against alpha diversity via Shannon index of all individual springs (n=925) in 10 °C increments, with linear regression applied to each increment. (b) Non-metric multidimensional scaling (NMDS) plot of beta diversity (via Bray-Curtis dissimilarities) between all individual microbial community structures sampled (n=925).