net. The computer code for executing this analysis is available on the GitHub site for this paper. See Supplementary Methods for additional details.

Hypercorrelator analysis

The top 15 ASD hypercorrelators were identified by ranking all 467 measured metabolites by the total difference in out-of-pathway correlations (edges) meeting a q value of <0.05 and calculating the difference of ASD-TD. The top 15 had the most positive (+) differences between ASD and TD. Reciprocally, the top 15 TD hypercorrelators had the most negative (−) differences between ASD and TD. This ranking is reported in Supplementary Data 13 and 14.

Statistics and reproducibility

Demographic data were analyzed by t-tests or non-parametric Mann-Whitney U tests in GraphPad Prism (https://www.graphpad.com). Categorical data and 2 × 2 tables were analyzed by Fisher’s exact test. AUC data from metabolomics were log2 transformed, scaled by control standard deviations, and the resulting z scores were analyzed by VIP scores calculated by multivariate PLS-DA in MetaboAnalyst 5.0119. Cross-validation was used to calculate accuracy, r2, and q2 statistics120. Permutation analysis (×1000) was used to calculate the p value for the model. Mean decrease in accuracy (MDA) scores were calculated by random forest analysis from 5000 trees in R121. FDRs were calculated by the method of Benjamini and Hochberg122 and Bayesian false discovery rates by Storey q value123. Metabolites with VIP ≥ 1.5 and MDA > 0 were considered significant and were used for pathway enrichment analysis121. Bubble impact plots were visualized in Python. Significance was quantified as the hypergeometric p value and plotted on the y-axis. The k-nearest neighbor algorithm124, a non-parametric classification method used to cluster metabolites and pathways that behave in similar ways, was used to identify the metabolite clusters. Classifiers of 4–7 metabolites were selected and tested for diagnostic accuracy using area under the receiver operator characteristic curve and random forest analysis. Five hundred bootstrappings were used to calculate the 95% confidence intervals of the ROC curve. The 2 × 2 confusion matrix generated in MetaboAnalyst was used to calculate the sensitivity, specificity, and the accuracy of the diagnostic classifier. Diagnostic accuracy was calculated using the 1000 permutations120. Classifiers were validated within sample using repeated double cross-validation with bootstrapping 100 times to test random subsamples of 2/3 in and 1/3 out, and by permutation analysis125. See Supplementary Information for additional methods.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

A special note from the authors

Although the words “normal” and “abnormal”, “regulated” and “dysregulated”, and “functional” and “dysfunctional” are used in this paper, a major result of this research was the finding that the ASD phenotype is the result of developmental changes caused by a normal physiologic response to the signals that cells are receiving in autistic children. The cells of neurotypical children do not receive the same cell danger signals, and therefore do not show the same set of mitochondrial, metabolic, immunologic, microbiome, autonomic, neuroendocrine, neurodevelopmental, and behavioral changes. We are sensitive to the negative impact that poorly chosen words can have126,127,128,129 and have endeavored to use neutral, specific, and descriptive language whenever possible.

Study approvals

This study was approved by the University of California, San Diego (#140072, #171940, #190972), California state (#1162, #2018-020), and University of California, Davis (#226004-4) institutional review boards (IRBs). Informed signed consent was obtained prior to participation.