Ingestible capsule sampling device

The capsule sampling device (CapScan, Envivo Bio) consists of a hollow elastic collection bladder capped by a one-way valve. The device is prepared for packaging by evacuating the collection bladder, folding it in half and packaging the folded device inside a dissolvable capsule measuring 6.5 mm in diameter and 23 mm in length, onto which an enteric coating is applied. The capsule and the enteric coating prevent contamination of the collection bladder from oral–pharyngeal and gastric microorganisms during ingestion. When the device reaches the target pH, the enteric coating and capsule disintegrate. The target pH is 5.5 for type 1, 6 for type 2 and 7.5 for type 3 and type 4, with type 4 also having a time delay coating to bias collection towards the ascending colon. After the enteric coating disintegrates, the collection bladder unfolds and expands into a tube 6 mm in diameter and 33 mm in length, thereby drawing in up to 400 µl of gut luminal contents through the one-way valve. The one-way valve maintains the integrity of the sample collected inside the collection bladder as the device moves through the colon and is exposed to stool.

In this study, participants concurrently ingested sets of four devices, each with distinct coatings to target the proximal to medial regions of the small intestine (coating types 1 and 2) and more distal regions (coating types 3 and 4). After sampling, the devices were passed in the stool into specimen-collection containers and immediately frozen. After completion of sampling, the stool was thawed and the devices were retrieved by study staff. The elastic collection bladders were rinsed in 70% isopropyl alcohol and punctured with a sterile hypodermic needle attached to a 1-ml syringe for sample removal. Samples were transferred into microcentrifuge tubes, and the pH was measured with an InLab Ultra Micro ISM pH probe (Mettler Toledo). A 40-µl aliquot was spun down for 3 min at 10,000 RCF, and its supernatant was used for metabolomics analysis while the pellet was used for proteomics analysis. The rest of the sample was frozen until being thawed for DNA extraction.

Study design

The study was approved by the WIRB-Copernicus Group institutional review board (study 1186513), and informed consent was obtained from each participant. Healthy volunteers were selected to exclude participants with clinically detectable gastrointestinal conditions or diseases that would potentially interfere with data acquisition and interpretation. There was no blinding, randomization, or statistical methods to determine sample size.

Participants met all of the following criteria for study inclusion: (1) age between 18 and 70 years; (2) American Society of Anesthesiologists (ASA) physical status class risk of 1 or 2; (3) for women of childbearing potential, a negative urine pregnancy test within 7 days of the screening visit and willingness to use contraception during the entire study period; and (4) fluency in English, with an understanding of the study protocol and ability to supply informed written consent, as well as compliance with study requirements.

Individuals with any of the following conditions or characteristics were excluded from the study: (1) a history of any of the following: prior gastric or oesophageal surgery, including lap banding or bariatric surgery, bowel obstruction, gastric outlet obstruction, diverticulitis, IBD, ileostomy or colostomy, gastric or oesophageal cancer, achalasia, oesophageal diverticulum, active dysphagia or odynophagia, or active medication use for any gastrointestinal conditions; (2) pregnancy or planned pregnancy within 30 days of the screening visit or breast-feeding; (3) any form of active substance abuse or dependence (including drug or alcohol abuse), any unstable medical or psychiatric disorder, or any chronic condition that might, in the opinion of the investigator, interfere with conduct of the study; or (4) a clinical condition that, in the judgment of the investigator, could potentially pose a health risk to the individual while they were involved in the study.

Fifteen healthy individuals were enrolled in this study, and each swallowed at least 17 devices over the course of 3 days (for demographics, see Supplementary Table 1). Daily instructions included the following guidelines: record all foods and the time they were consumed throughout the day; if you work out, do so in the morning; eat breakfast and lunch as usual; swallow a set of four devices 3 h after lunch with up to two-thirds cup water; do not eat or drink anything for at least 2 h after swallowing the devices; if hungry after 2 h, snack lightly (up to 200 calories); do not drink any caffeinated beverages after lunch until the next morning; collect all stool starting 6 h after swallowing this set of devices until 48 h after swallowing the next set of devices; eat dinner as usual at least 6 h after lunch; swallow a set of four devices 3 h after dinner with two-thirds cup water; after swallowing this set, do not eat or drink anything until the morning. Alcohol consumption and diet contents were not restricted. All ingested devices were recovered, and no adverse events were reported during the study. Of the 255 ingested devices, 15 were set 1 safety devices (not included in analysis) and 22 contained gas or low sample volume. Saliva samples were collected after evening meals and immediately frozen at –20 °C. A sample of every bowel movement during the study was immediately frozen by the participant at −20 °C. A total of 306 samples (= 29 saliva, = 218 devices, = 59 stool) provided sufficient material for multi-omic analyses (Extended Data Fig. 2a). Furthermore, participant 1 provided additional samples for assessment of replicability and microbial blooming.

Blooming analysis

To assess the effect of in-body incubation on the contents of the devices between the time of sample collection and sample retrieval, a set of four devices (one of each type) was ingested by participant 1. Following exit in a bowel movement at 32 h, the devices were immediately transferred to an anaerobic chamber and incubated at 37 °C. An aliquot of each sample was taken at 32 h (immediately after the bowel movement), 58 h and 87 h (with the latter two time points simulating lengthier gut transit times). These aliquots were subjected to 16S rRNA gene amplicon sequencing. The rank abundance of the 30 most abundant ASVs at 32 h shifted at 58 h by a median of 8–16 ranks and at 87 h by 12–30 ranks (Extended Data Fig. 1). The 9–17 ASVs that increased from below to above the detection limit during incubation collectively accounted for a relative abundance of 9.4–13.8% after 58 h and 5.2–18% after 87 h, presumably because of growth during incubation. Thus, although outgrowth can potentially alter assessments of microbiota composition, major changes are not expected for transit times of ~58 h or less.

Time-lapse imaging

Agarose (1%) pads with BHI medium were sealed using VALAP (1:1:1 Vaseline:lanolin:paraffin) and maintained at 37 °C using a heated environmental chamber (HaisonTech). Phase-contrast images were collected on a Nikon Ti-E epifluorescence microscope using µManager (v.1.4)42.

DNA extraction and 16S rRNA gene sequence analysis

Of the 240 devices, 218 collected >50 µl of intestinal fluids and were subjected to DNA extraction and 16S rRNA gene and metagenomic sequencing; the remainder sampled <50 µl or were filled with gas, most likely from the colon.

For the 218 devices that sampled >50 µl, DNA was extracted using a Microbial DNA extraction kit (Qiagen)43 and 50 µl from a device, 200 µl of saliva or 100 mg of stool.

16S rRNA amplicons were generated using Earth Microbiome Project-recommended 515F/806R primer pairs and 5PRIME HotMasterMix (Quantabio, cat. no. 2200410) with the following programme in a thermocycler: 94 °C for 3 min; 35 cycles of 94 °C for 45 s, 50 °C for 60 s and 72 °C for 90 s; and 72 °C for 10 min. PCR products were cleaned, quantified and pooled using the UltraClean 96 PCR Cleanup kit (Qiagen, cat. no. 12596-4) and Quant-iT dsDNA High-Sensitivity Assay kit (Invitrogen, cat. no. Q33120). Samples were sequenced with 250-bp reads on a MiSeq instrument (Illumina).

Sequence data were demultiplexed using the Illumina bcl2fastq algorithm at the Chan Zuckerberg Biohub Sequencing facility. Subsequent processing was performed with the R statistical computing environment (v.4.0.3)44 and DADA2 as previously described43 using pseudo-pooling45. truncLenF and truncLenR parameters were set to 250 and 180, respectively. Taxonomy was assigned using the Silva rRNA database (v.132)46. Samples with >2,500 reads were retained for analyses. We obtained sufficient sequencing reads from 210 samples, which were the focus of subsequent analyses, along with sequencing data from 29 saliva and 58 stool samples (one participant provided only one saliva sample, and one stool sample had insufficient sequencing reads; Extended Data Fig. 2a).

A phylogenetic tree was constructed using phangorn as previously described47. Shannon diversity was calculated using the phyloseq::estimate_richness function, which is a wrapper for the vegan::diversity function48,49. Because intestinal samples were often dominated by a single ASV (Fig. 2c), the Canberra distance metric was used for pairwise beta-diversity comparisons. Only the 455 ASVs represented by ≥3 reads in ≥5% of samples were used to calculate distances.

Limitations and contamination analysis

One limitation of our study is that the exact location of sample collection within the intestines could not be clearly defined or validated. Variability in intestinal peristalsis and pH during normal digestion may cause devices within a set to experience different pH gradients; hence, they may open before or after their intended collection sites. Despite this limitation, analysis of 210 intestinal samples from 15 individuals showed consistent trends of biochemical and microbial activity in the human intestines. More consistent sampling along a longitudinal gradient might be attained in future studies by collecting multiple sequential samples into a single device in a timed manner.

The significantly different bile acid profiles in intestinal compared with stool samples indicate that it is unlikely that stool contaminated the intestinal sampling devices during transit or sample recovery. However, because of the large increase in microbial density along the intestinal tract37, even a minute amount of stool contamination could affect microbiota composition. We therefore used a statistical approach to identify samples as potentially contaminated on the basis of microbial community composition. Given the directional motility of the intestinal tract, one would expect intrinsic overlap between intestinal and stool microbial communities. Latent Dirichlet allocation with the topicmodels R package50 was used to identify co-occurring groups of microorganisms (‘topics’51) from intestinal and stool samples for each participant. For each intestinal sample, the cumulative probability of topics identified as derived from the same participant’s stool was computed. Device samples with ≥10% of the total community identified as potentially originating from stool topics were flagged as possibly contaminated. Using this very conservative definition, 38 of the 210 intestinal samples with adequate sequencing read counts (originating from 12 of the study participants) were identified as possibly contaminated. All analyses presented in this study used all available data to avoid bias, but re-analysis of all data after removing the 38 samples that showed any signal of potential contamination from stool resulted in the same statistical trends as with the complete group of samples.

Metagenomic sequencing

Extracted DNA from all samples was arrayed in a 384-well plate, and concentrations were normalized after quantification using the PicoGreen dsDNA Quantitation kit (ThermoFisher). DNA was added to a tagmentation reaction, incubated for 10 min at 55 °C and immediately neutralized. Mixtures were added to ten cycles of a PCR that appended Illumina primers and identification barcodes to allow for pooling of samples during sequencing. One microlitre of each well was pooled, and the pooled library was purified twice using AMPure XP beads to select appropriately sized bands. Finally, library concentration was quantified using a Qubit instrument (ThermoFisher). Sequencing was performed on a NovaSeq S4 instrument with read lengths of 2 × 146 bp.

Preprocessing of raw sequencing reads and metagenomic assembly

Skewer (v.0.2.2)52 was used to remove Illumina adaptors, after which human reads were removed with Bowtie2 (v.2.4.1)53. Metagenomic reads from a single saliva, intestinal or stool sample were assembled with MEGAHIT (v.1.2.9)54. Assembled contigs were binned with MetaBAT 2 (v.2.15)55 into 7,655 genome bins. checkM (v.1.1.3)56 and quast (v.5.0.2)57 were used to assess quality; bins with >75% completeness and <25% contamination were dereplicated at 99% ANI (strain level) with dRep (v.3.0.0)58, resulting in 696 representative MAGs across all samples. GTDB-Tk was used to assign taxonomy59. Default parameters were used for all computational tools.

Strain isolation from intestinal and stool samples

Isolates were obtained directly from samples or from communities derived from passaging of samples60 by either plating or fluorescence-activated cell sorting (FACS)61. For plating, samples were serially diluted tenfold onto BHI + 10% defibrinated horse blood (BHI-blood) plates and incubated for 72 h at 37 °C in an anaerobic chamber. Single colonies were re-streaked onto BHI-blood plates. This process was repeated an additional two times to ensure that the colony was axenic. Single colonies were then picked into a 2-ml deep-well plate containing 500 µl of BHI supplemented with menadione (vitamin K), cysteine and hemin (BHIS). In certain cases, Reinforced Clostridial Medium supplemented with menadione (vitamin K), cysteine and hemin (RCMS) was used instead. For FACS, single cells were sorted into BHIS using a previously described protocol that allows for isolation of strict anaerobes61.

After 72 h of growth in an anaerobic chamber at 37 °C, frozen stocks of all isolates were made using a final concentration of 12% glycerol. To identify isolates, cultures were spun down and pellets were resuspended with PCR-grade water in a 1:1 ratio. The primers 5′-AGAGTTTGATCCTGGCTCAG-3′ and 5′-GACGGGCGGTGWGTRCA-3′ were used to amplify the 16S rRNA gene. The PCR product was sent for Sanger sequencing, and sequences were filtered using sangeranalyseR with default parameters62. These sequences were searched against the rRNA/ITS BLAST database, and the top species hit was used to identify the strain.

Analysis of CAZyme and AMR content

Putative genes were called on assembled contigs for each sample or on assembled MAGs using Prodigal63. CAZyme genes were identified using (v.3.0.5)64 with default parameters (searching with HMMER, eCAMI and DIAMOND). Genes identified in at least two of three programmes were dereplicated to create a curated database. Metagenomic reads for each sample were mapped against this database to calculate the percentage of reads mapped. AMR genes were identified using rgi (v.5.2.0)28 with default parameters. All identified genes were filtered for >90% coverage and dereplicated to create a curated database of AMR genes. Metagenomic reads for each sample were mapped against this database to calculate the percentage of reads mapped.

CARD is known to be biased towards pathogens such as Escherichia/Shigella species28, and indeed the relative abundance of Escherichia/Shigella species was highly positively correlated with the abundance of AMR genes in intestinal samples (Extended Data Fig. 5h). In stool samples, although no ASVs were positively correlated with the percentage of reads aligned to CARD, the abundance of the Enterobacteriaceae family was positively correlated, as was that of the Bacteroidaceae family (Extended Data Fig. 5i). To determine whether this correlation was driven by efflux activity, we recomputed AMR gene abundance while ignoring the 1,273 genes annotated as efflux pumps. In this analysis, intestinal samples did not exhibit significantly higher numbers of reads mapping to non-efflux AMR genes (Extended Data Fig. 5j). We identified AMR genes in each of our MAGs and found that Enterobacteriaceae possessed ~10- to 100-fold more AMR genes (normalized to the total number of genes) than other taxonomic families (Extended Data Fig. 5k).

Viral contig identification

After assembly, contigs >1 kb in length were analysed using VirSorter2 (ref. 65), DeepVirFinder66 and VIBRANT67. Contigs identified as viral by at least one algorithm (VirSorter2 score ≥0.9, or DeepVirFinder score ≥0.9 and P < 0.05, or VIBRANT score of medium quality or higher) were clustered using an ANI cut-off of 0.95 and coverage cut-off of 85%. The quality of the clustered contigs was analysed using CheckV68, which also classified viral contigs as prophages if they contained both viral and bacterial regions.

Detection of prophage induction events

The algorithm PropagAtE69 was used to identify active prophages with default parameters. In each sample, the total reads were first rarefied so that the number of reads mapped as viral was 5 × 105 (six device samples and ten saliva samples had fewer than 5 × 105 reads, and hence all reads from these samples were used for analyses). The reads were then mapped to the prophage contigs with a minimum per-cent identity of 97%. The algorithm identifies a prophage as active (induced) when the ratio of prophage to host depth for that contig is >2 and the prophage region has >50% coverage.

Proteomics sample preparation

After thawing samples, 20 µl of MS-grade water (Pierce) was added to each sample and the mixture was vortexed. Twenty microlitres of this mixture was transferred to a 96-well plate (AFA-TUBE TPX plate, cat. no. 520291, Covaris). Twenty microlitres of cell lysis buffer (containing Tris, CAA, TCEP and 8% SDS)9 was added to each sample aliquot, and samples were boiled for 10 min in a PCR thermocycler (Eppendorf) to achieve reduction of disulfide bridges and alkylation of cysteines and to boost protein denaturation. Following boiling, samples were placed in a −80 °C freezer to ensure microbial capsule dissociation. Freeze–thaw cycles were repeated twice. Subsequently, samples were processed using the APAC protocol ( In brief, we applied Adaptive Focused Acoustics (AFA, Covaris) sonication in the 96-well plate for a total duration of 300 s per column with an LE220-plus Focused ultrasonicator (Covaris) using the following parameters: peak power, 450 W; duty factor, 50%; cycles, 200; average power, 225 W.

In preparation for protein aggregation capture (PAC), magnetic carboxylate-modified particles (Sera-Mag, cat. no. 24152105050350, GE Healthcare/Merck) were washed three times with 1 ml of MS-grade water. Because the protein concentration of the samples varied over a large range, 500 µg of beads were added to each sample well to ensure sufficient beads regardless of the protein concentration. Protein precipitation was induced by the addition of acetonitrile at a final concentration of 70%.

Proteins were subsequently extracted from the solution through precipitation of the magnetic particles and purification by three steps of washing in 2-isopropanol. Following each wash, the plate was placed at 50 °C and shaken at 1,300 rpm for 10 min. To ensure complete precipitation, we incubated the suspension for a further 10 min at room temperature while shaking at 1,300 rpm and then allowed the beads to settle for 10 min without agitation.

To determine the concentration of enzymes needed during sample digestion, we measured the protein yield using a Nanodrop. Samples were then resuspended in digestion buffer, which contained 100 µl of 100 mM Tris (pH 8.5), supplemented with 0.5 µg trypsin and 0.5 µg LysC to achieve an enzyme:protein ratio of 1:20, and incubated overnight at 37 °C with shaking at 1,300 rpm.

Following digestion, the supernatant was removed by placing the 96-well plate on a magnetic rack (DynaMag-96 Side Skirted Magnet, cat. no. 12027, Invitrogen, ThermoFisher Scientific), allowing the supernatant to be easily transferred to a 96-well PCR plate (twin.tec PCR Plate LoBind, semi-skirted, 250 µl; cat. no. 0030129504, Eppendorf). The enzymatic reaction in the collected supernatant was quenched using trifluoracetic acid (TFA) at a final concentration of 1% (v/v). Peptides were purified using two-layer SDB–RPS (Empore SPE Disks; CDS Analytical, cat. no. 98-0604-0226-4) StageTips by three washing steps, twice in 1% TFA in 2-isopropanol and once in 0.2% TFA in water. Following the washing steps, peptides were eluted from the StageTips using elution buffer (80% acetonitrile and 1% NH4+)70. Purified samples were vacuum-dried in a SpeedVac (Eppendorf) at 60 °C for 1.5 h and resuspended in A* injection buffer (2% (v/v) acetonitrile and 0.1% (v/v) TFA in water). Protein concentration was measured in injection buffer for each sample using a Nanodrop, and samples were stored at −20 °C until MS measurement.

Proteomics UHPLC and mass spectrometry

Samples were analysed using LC–MS instrumentation, comprising an EASY-nLC 1200 ultra-high-pressure system coupled to an Exploris 480 with a nano-electrospray ion source (ThermoFisher Scientific). For each sample, the equivalent of 360 ng of purified peptides was separated on a custom 50-cm C18 LC column71. Peptides were eluted from the column using a linear gradient from 10% to 30% buffer B over 90 min at a constant flow rate of 300 nl min−1, followed by a stepwise increase of buffer B to 60% for 5 min and an increase to 95% buffer B over the following 5 min. Afterwards, we applied a 5-min wash with 95% buffer B, followed by a decrease to 1% buffer B over 5 min and a 20-min wash.

The column temperature was kept constant at 50 °C using a custom oven, and HPLC parameters were monitored in real time using SprayQC software72. MS data were acquired with a Top15 data-dependent MS/MS method. The target values for the full-scan MS spectra were 3 × 106 charges in the m/z range 300–1,650, with a maximum injection time of 20 ms and a resolution of 60,000 at m/z 200. Fragmentation of precursor ions was performed by higher-energy C-trap dissociation (HCD) with a normalized collision energy of 27 eV. MS/MS scans were performed at a resolution of 15,000 at m/z 200 with a target value of 1 × 105 and a maximum injection time of 28 ms. Dynamic exclusion was set to 40 s to avoid repeated sequencing of identical peptides.

A HeLa sample was run approximately every 70 samples to ensure that the performance of the LC system and MS was maintained throughout the entire study. Technical replicates were collected for each plate in a random fashion to assess technical reproducibility. In all, 212 device samples and 56 stool samples passed quality control and were used for analyses (Extended Data Fig. 2a).

Proteomics data processing

MS raw files were analysed with MaxQuant software (v., and peptide lists were searched against the UniProt human SwissProt and TREMBL FASTA database (version June 2022). A common contaminants database was also included74. Our search parameters included cysteine carbamidomethylation as a fixed modification and N-terminal acetylation and methionine oxidation as variable modifications. The false discovery rate (FDR) for proteins and peptides was set to 0 at a minimum peptide length of 7 amino acids. An in silico tryptic digest was used with a maximum of two missed cleavage sites. Peptide identification was performed at a precursor mass accuracy of 7 ppm and a fragment mass accuracy of 20 ppm. A reversed decoy database was used to estimate the fraction of false positive hits. Label-free quantification (LFQ) was performed at a minimum ratio count of 2 (ref. 75). LFQ values, or non-normalized intensity values when indicated, were further processed in R (v.4.1.2). Proteins were filtered for 70% valid values in all samples. For PCA, missing values were imputed with the regularized method of the package missMDA (v.1.19), and PCA plots were generated with PCAtools (v.2.4.0). Statistical analysis was performed with limma (v.3.48.3) and a moderated t-test with FDR adjustment for multiple-hypothesis testing.

Sample preparation for LC–MS/MS metabolomics analysis

Supernatants from intestinal samples were extracted using a modified 96-well-plate biphasic extraction76. Samples in microcentrifuge tubes were thawed on ice, and 10 µl was transferred to the wells of a 2-ml polypropylene 96-well plate in a predetermined randomized order. A quality-control sample consisting of a pool of many intestinal samples from pilot studies was used to assess analytical variation. A quality-control sample matrix (10 µl) and blanks (10 µl of LC–MS-grade water) were included for every tenth sample. Further, 170 µl of methanol containing UltimateSPLASH Avanti Polar Lipids was added to each well as an internal standard. Then, 490 µl of methyl-tert-butyl-ether (MTBE) containing the internal standard cholesterol ester 22:1 was added to each well. Plates were sealed, vortexed vigorously for 30 s and shaken on an orbital shaking plate for 5 min at 4 °C. The plate was unsealed, and 150 µl of cold water was added to each well. Plates were resealed, vortexed vigorously for 30 s and centrifuged at 4,000 RCF for 12 min at 4 °C.

From the top phase of the extraction wells, two aliquots of 180 µl each were transferred to new 96-well plates, and two aliquots of 70 µl each from the bottom phase were transferred to two other new 96-well plates. Plates were spun in a rotary vacuum until dry, sealed and stored at −80 °C until LC–MS/MS analysis. One of the 96-well plates containing the aqueous phase of extract was dissolved in 35 µl of HILIC-run solvent (8:2 acetonitrile/water). Five microlitres was analysed using non-targeted HILIC LC–MS/MS analysis. The autosampler temperature was kept at 4 °C. Immediately after HILIC analysis, the 96-well plates were spun in a rotary vacuum until dry, sealed and stored at −80 °C until targeted bile acid analysis.

Multiple dilutions were prepared for bile acid analysis as follows. The dried samples described above were dissolved in 60 µl of bile acid-run solvent (1:1 acetonitrile/methanol containing six isotopically labelled bile acid standards at 100 ng ml–1) by 30 s of vortexing and 5 min of shaking on an orbital shaker. From this plate, 5 µl was transferred to a new 96-well plate and combined with 145 µl of bile acid-run solvent. Both dilutions were analysed for all samples, and samples that still presented bile acids above the highest concentration of the standard curve (1,500 ng ml–1) were diluted 5:145 once more and re-analysed. A nine-point standard curve that ranged from 0.2 ng ml–1 to 1,500 ng ml–1 was used with all samples. The standard curve solutions were created by drying bile acid standard solutions to achieve the desired mass of bile acid standards and then dissolved in bile acid-run solvent. Three standard curve concentration measurements were analysed after every 20 samples during data acquisition along with one method blank.

For stool analyses, approximately 4 mg (±1 mg) of wet stool was transferred to 2-ml microcentrifuge tubes. Twenty microlitres of quality-control mix was added to the microcentrifuge tubes for quality-control samples. Blank samples were generated using 20 µl of LC–MS-grade water. To each tube, 225 µl of ice-cold methanol containing internal standards (as above) was added, followed by 750 µl of ice-cold MTBE with cholesterol ester 22:1. Two 3-mm stainless steel grinding beads were added to each tube, and tubes were processed in a Geno/Grinder automated tissue homogenizer and cell lyser at 1,500 rpm for 1 min. Then, 188 µl of cold water was added to each tube. Tubes were vortexed vigorously and centrifuged at 14,000 RCF for 2 min. Two aliquots of 180 µl each of the MTBE layer and two aliquots of 50 µl each of the lower layer were transferred to four 96-well plates, and the plates were spun in a rotary vacuum until dry, sealed and stored at −80 °C until analysis with the intestinal samples. Stool samples were analysed using HILIC non-targeted LC–MS/MS and diluted in an identical manner to intestinal samples as described above. Stool samples were analysed in a randomized order after intestinal samples.

Metabolomics data acquisition

Samples were analysed using a Vanquish UHPLC system coupled to a TSQ Altis triple-quadrupole mass spectrometer (ThermoFisher Scientific). An Acquity BEH C18 column (1.7 µm, 2.1 mm × 100 mm) with an Acquity BEH C18 guard column (1.7 µm, 2.1 mm × 5 mm) was used for chromatographic separation with mobile phases A (LC–MS-grade water with 0.1% formic acid) and B (LC–MS-grade acetonitrile with 0.1% formic acid) and with a flow rate of 400 µl min–1 and column temperature of 50 °C. The gradient began at 20% B for 1 min and shifted to 45% B between 1 and 11 min, to 95% B between 11 and 14 min and to 99% B between 14 and 14.5 min; 99% B was maintained until 15.5 min and transitioned to 20% B between 15.5 and 16.5 min; and 20% B was maintained until 18 min. The autosampler temperature was kept at 4 °C. The injection volume was 5 µl, and MRM scans were collected for all bile acids and internal standards (Supplementary Table 6).

Metabolomics data processing

MRM scans were imported to Skyline77 software. Skyline performed peak integration for all analytes with given mass transitions and retention time windows optimized using authentic chemical standards (Supplementary Table 6). The chromatogram for each analyte was manually checked to confirm correct peak integration. Peak area was exported for all analytes. Analytes were omitted from further analysis if a convincing chromatographic peak was not observed in ≥1 sample (Supplementary Table 6). The ratio of analyte to its closest eluting internal standard was calculated and used for quantification. A linear model was fit to standard curve points for each bile acid (R2 > 0.995 for all bile acids), and the model was applied to all samples and blanks to calculate concentrations. The average concentration reported for method blanks was subtracted from sample concentrations. Because multiple dilutions were analysed for each sample, the measurement closest to the centre of the standard curve (750 ng ml–1) was used. Zero values were imputed with a concentration value between 0.001 and 0.1 ng ml–1. Concentrations were reported as ng ml–1 for intestinal sample liquid supernatant and ng g–1 for wet stool. In all, 218 device samples and 57 stool samples passed quality control and were used for analyses (Extended Data Fig. 2a).

Non-targeted bile acid quantification

Bile acids conjugated to amino acids (for example, TyroCA, LeuCA and PhenylCA) were not included in the list for targeted analysis. Nonetheless, 22 microbially conjugated bile acids were detected during non-targeted data acquisition for intestinal and stool samples using HILIC chromatography as described previously78. Peaks corresponding to these microbially conjugated bile acids were annotated using m/z values for precursor mass, diagnostic MS/MS fragment ions (337.2526 for trihydroxylated and 339.2682 for dihydroxylated bile acids) and the corresponding amide conjugate fragment ion (Supplementary Table 7), as reported previously40. MS/MS spectra from synthetic standards for three microbially conjugated bile acids (Extended Data Fig. 9) served as positive controls based on previously collected experimental MS/MS spectra35. Non-targeted HILIC analysis did not include bile acid standard curves to allow for direct quantification, so approximate quantification was achieved by comparing the concentration of GCA from targeted analysis to GCA peak height intensity from non-targeted analysis. A quadratic model was fit to GCA values from both analyses (R2 = 0.89) and applied to the peak height intensity values of microbe-conjugated bile acids to calculate their approximate concentration. Approximate concentrations were used for analysis of bile acids measured with non-targeted analysis.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.