Differential Gene Expression Analysis in Candida albicans : A Comparative Study Between Malic Acid and Glucose Media for Insights into Pathogenicity and Metabolic Pathway Interactions

Candida albicans is a pathogenic fungus known for causing various human infections, ranging from localized candidiasis to systemic invasive conditions. This research examined variations in gene expression patterns of C. albicans when cultivated in malic acid and glucose media. Data set GSE223317 was extracted from the Gene Expression Omnibus (GEO) database. Bioinformatic analyses were executed using RStudio, utilizing the limma and edgeR packages. Following the normalization of the raw data, 278 differentially expressed genes (DEGs) were found, with 127 displaying increased expression and 151 demonstrating reduced expression in the malic acid medium compared to the glucose medium. Subsequent analyses, including Gene Ontology (GO) and KEGG pathway mapping, were conducted via the Candida Genome Database and DAVID. These assessments unveiled substantial enrichment in biochemical pathways, including the Tricarboxylic Acid (TCA) Cycle and the biosynthesis of secondary metabolites. Furthermore, protein-protein connectivity networks were constructed with the assistance of the STRING database. Furthermore, hub genes associated with metabolic pathways relevant to the shift from glucose to malic acid as a carbon source were explored, utilizing protein-protein interaction analyses via the MCODE module and Cytohubba plugin in Cytoscape. The findings strongly indicate that transitioning from glucose to malic acid as a carbon source could enhance the metabolic adaptability and potential pathogenicity of C. albicans . Functional enrichment analyses also showed that key hub genes and top-ranking DEGs were significantly involved in oxidoreductase activity, mitochondrial function, and hyphal formation.


INTRODUCTION
Candida albicans (C.albicans) is a prevalent fungal microorganism that colonizes human surfaces, particularly the gastrointestinal tract (Li et al., 2022).In most cases, this fungal organism resides innocuously as a commensal, yet under specific circumstances, it has the potential to induce a spectrum of infections, spanning from mucosal ailments to critical, potentially fatal systemic candidiasis (Jacobsen, 2023).
The ability of C. albicans to shift between these benign and pathogenic states is a subject of intense research, and understanding the genetic and molecular mechanisms underlying this switch is crucial for developing new therapeutic strategies (Lopes & Lionakis, 2022).
The relationship between biochemical processes and pathogenesis in microorganisms has been a subject of scientific inquiry for decades (Gil-Gil et al., 2023;Pätzold et al., 2021).Historically, it has been observed that certain pathogens, including C. albicans, exhibit heightened virulence when grown under specific nutrient conditions, leading researchers to hypothesize that nutrient availability could directly or indirectly influence the expression of virulence factors (Pensinger et al., 2023).The carbon source is a major nutrient affecting the metabolic pathway in C. albicans.Similar to many organisms, C. albicans primarily uses glucose as its main carbon and energy source (Maliszewska et al., 2020).However, in environments where glucose is scarce, this adaptable fungus can metabolize alternative carbon sources (Bayot et al., 2023;Lok et al., 2021).One alternative is malic acid, a dicarboxylic acid that contributes to the citric acid cycle (Katayama et al., 2022).Preliminary studies indicate that the metabolism of non-glucose carbon sources like malic acid can modulate the expression of C. albicans' virulence factors (Pountain et al., 2021).However, the molecular mechanisms underlying this modulation remain poorly understood.Thus, a deeper understanding of how C. albicans adapts to different carbon sources could provide critical insights into its metabolic flexibility and potential virulence mechanisms (Ene et al., 2012).
In the context of C. albicans, early studies in the 1980s and 1990s started drawing connections between carbon metabolism and virulence (Hostetter, 1990;McCourtie & Douglas, 1984).These investigations centered on glucose and its impact on the transition from the yeast to hyphal form, an essential virulence determinant in C. albicans (Gergondey et al., 2016).The yeast form is typically nonpathogenic and is associated with commensal colonization, while the hyphal form is invasive and linked to tissue damage and dissemination (Laín et al., 2007).Malic acid, a key intermediate in the tricarboxylic acid cycle, can serve as a carbon source under glucose-limited conditions (Bernard & Guéguen, 2022).The utilization of intermediates of the tricarboxylic acid cycle by C. albicans in various environmental niches, including the gastrointestinal tract, is of scientific interest.The contribution of these intermediates to the pathogenicity of C. albicans remains ambiguous.While some preliminary studies indicate that growth in the presence of these intermediates might affect gene expression (Brandt et al., 2023), a detailed analysis is still awaited.
Hence, the current study aims to perform a comprehensive differential gene expression analysis of C. albicans cultured in malic acid versus glucose media.The overarching goal is to identify key metabolic and virulence pathways modulated by the type of carbon source.Bioinformatic tools and databases were leveraged, such as RStudio, limma, edgeR, the Candida Genome Database, DAVID, and STRING, to analyze the data set GSE223317 from the Gene Expression Omnibus (GEO) database.The analysis focuses on identifying DEGs that are up-or down-regulated in a malic acid medium compared to a glucose medium.The investigation of the functional implications of these DEGs, especially concerning metabolic pathways and their potential roles in the pathogenicity of C. albicans, is also intended.

MATERIALS AND METHODS 1-Dataset Retrieval:
The gene expression data for C. albicans were sourced from the GEO database concerning dataset GSE223317, available at https://www.ncbi.nlm.nih.gov/ geo/.The parent investigation, conducted using the Illumina Nova Seq 6000 platform (GPL28323 for C. albicans), aimed to investigate how C. albicans adapts metabolically to a variety of carbon sources, including malic acid, as well as different carbon and nitrogen sources.A control medium was used as a reference, featuring glucose as the carbon source.Transcriptional profiling was carried out following a 4-hour incubation at 37°C.Subsequently, attention was directed towards three samples treated with glucose and three with malic acid as the exclusive carbon source, as outlined in Table 1, for further in-depth analyses.
All IDs obtained through the dataset were transformed into their respective LOCUS_TAG using the Candida genome database (CGD; http://www.Candida genome.org/) (Skrzypek et al., 2017).This transformation was done to ensure consistency with the databases subsequently used in the study.Additionally, we verified no duplicated gene names in the dataset.The flow of the current study is represented in Figure 1.

2-Data Preprocessing and Normalization:
To improve the accuracy of the differential gene expression analysis in this study, genes that were non-coding or showed low expression were filtered out.This process was carried out after conducting quality control checks, read mapping and normalization of read counts.The differential gene expression and functional annotation analyses were executed using R software (v 4.3.0)and RStudio (v 2023.06.0 Build 421).

3-Principal Component Analysis (PCA):
The refined dataset was visualized using boxplots and assessed using principal component analysis (PCA).This analysis aimed to explore the variance attributed to glucose, used as the primary carbon source, compared to another carbon source, malic acid.This verification step was essential to confirm the dataset's suitability for subsequent analyses.For correlation analyses within and between samples of both groups, the corrplot package in R was utilized.The normalized dataset was then analyzed using the limma package.

4-Differentially Expressed Genes (DEGs) Identification:
The primary objective of this analysis was to discern and quantify the DEGs for the Malic acid and the glucose groups, which represented control.A design matrix was established to achieve this, incorporating the experimental conditions: Glucose and Malic acid as covariates.The analysis was then carried out using the 'limma' and 'edgeR' packages in R available through the Bioconductor project.Specific contrasts were defined between these conditions to focus on the biological questions of interest.A contrast matrix was set up using the 'make Contrasts' function in R to isolate the variations in expression patterns between the Malic and Glucose conditions.The count data underwent normalization and variance modeling using the Voom function.This step also generated a diagnostic plot to visualize the mean-variance relationship across all genes in the dataset.Following data normalization, a linear model was fitted using the 'lmFit' function.
The defined contrasts were then integrated into the model using the 'contrastsfit' function.The empirical Bayes method was employed to identify the DEGs between the two groups.Genes with an adjusted p-value (adj.p-value) of less than 0.05 and an absolute log2 fold change (log2FC) of 1 or greater were considered significantly differentially expressed.

5-Functional Assessment of GO and KEGG Pathways:
The functional enrichment analysis of DEGs was performed using the web-based application known as DAVID (http://DAVID.org)(Dennis et al., 2003).It is well-regarded for its proficiency in conducting GO and KEGG enrichment analysis.
Initially, GO analysis was employed to predict the functions of proteins, covering three main modules: biological processes, functions, and components.Following this, KEGG pathway analysis was utilized to integrate DEGs into specific pathways, creating networks that illustrate molecular interactions, reactions, and relationships.
For the enrichment analysis, DEGs meeting the significance criteria were uploaded to DAVID with a threshold of P<0.05.A default statistical t-test was applied, and a p-value greater than 0.05 was considered significant for the analysis.In pathway functional analysis, DEGs were linked to KEGG pathways with criteria set at a count >0 and a p-value <0.05.
To identify hub genes, Cytohubba adds-in tool in Cytoscape was used (Chin et al., 2014).It employs five topological analyses (Stress, Degree, Closeness, Bottleneck, and Betweenness).Gene involvement in clusters, top-ranking genes, and intersection sizes between gene sets are displayed in a heatmap and a plot, respectively, using the R packages "ComplexHeatmap" (Gu et al., 2016) and "complex UpsetR" (Lex et al., 2014).

7-Pathogen-Host Interaction Analysis:
In this phase, the interplay between C. albicans and the host organism is conducted concerning the top hub genes identified in the previous step.The website "Pathogen Host Interactions" (http://www.phi-base.org/)explores hub gene sharing in the context of Candida's virulence or pathogenicity.This analysis thoroughly examines hub genes shared between C. albicans and the host organism.The pivotal roles these genes play in the intricate dynamics of hostpathogen interactions are sought to be elucidated.Ultimately, a comprehensive understanding of C. albicans' pathogenic mechanisms and their impact on the host's response is achieved through this investigation.

RESULTS 1-Principal Component Analysis (PCA):
The mRNA and transcriptome expression profiles were obtained from the GSE223317 dataset.PCA assessed the data to avoid system errors and verify the reliable data for the subsequent analysis.The results showed that differences between groups were more remarkable than those within groups before and after data normalization (Fig. 2a)).The correlation plot (Fig. 2b) depicts the variations in expression profiles between groups, which were significant.In contrast, changes within samples of the same group were minimal or negligible.

2-Identification of DEGs
The GSE223317 dataset comprises six samples: three treated with glucose (serving as controls) and three with malic acid as the sole carbon source.Initially, the read counts were normalized.The results indicated that the median values across samples were consistent, suggesting the dataset is suitable for further analysis (Fig. 2).Moreover, raw gene expression count data were variancemodeled using the voom function from the limma package to stabilize the mean-variance relationship and make the data more suitable for linear modeling.This process also generated a diagnostic plot to assess the mean-variance relationship across the gene set visually (Fig. 3).A dataset analysis of GSE223317 samples comparing malic acid to glucose revealed a total of 278 differentially expressed genes (DEGs).Among them, 127 genes exhibited increased expression, while 151 showed decreased expression.The tables below (Tables 2 and 3) present the top 25 genes with the highest increase and decrease in expression based on their P-values.These DEGs were further visualized using a volcano plot and heatmap in Figure 3.

3-GO and KEGG Analysis:
The GO and KEGG enrichment analysis of DEGs was conducted using CGD and DAVID, as depicted in Figure 5.For both upregulated and downregulated genes, biological processes were primarily associated with the transport of organic substances and carbohydrates, small molecule metabolic processes, and homeostasis processes.In terms of molecular function, the results showed enrichment in oxidoreductase and transporter activities and vitamin binding.The analysis of cellular components indicated enrichment in mitochondrial components, cell walls, and plasma membranes.

4-Identification of Hub Gene by PPI Network and Modular Analyses:
PPI networks were established utilizing the STRING database, revealing the DEG network illustrated in Figure 6a.This network comprised 278 nodes and 809 edges.
To identify hub genes, we calculated the degree of all nodes using five topological analyses through the Cytoscape plugin cytoHubba.Genes with higher degree values in the topological analysis and presence in clusters were considered hub genes.In descending order, the top 10 hub genes were ACS1, SDH2, FBP1, PFK1, PYC2, ACO1, SCW4, MDH1, KGD1, and IDP2.The ranking of hub genes is visualized in the heatmap shown in Figure 7.

DISCUSSION
In this comprehensive study, we aimed to elucidate the metabolic adaptations of the fungal pathogen C. albicans in response to different carbon sources, specifically malic acid and glucose, utilizing transcriptomic data from the GSE223317 dataset.This research is critical for understanding the impact of carbon source variations on C. albicans' metabolic pathways and, consequently, its virulence potential (Zeng et al., 2023).
Our preliminary analyses PCA analysis revealed significant disparities in gene expression between the treatment groups exposed to malic acid and glucose.These differences were corroborated by correlation plots, indicating that the changes were not merely incidental but represented a consistent metabolic response (Figures 2a and 2b).Further analysis identified 278 DEGs within the dataset.Among these, 127 genes were upregulated, and 151 were downregulated when exposed to malic acid (Fig. 4).
Noteworthy among the top upregulated genes were OP4 and ATO1.OP4's elevated expression aligns with previous studies, emphasizing its role in adherence, stress response, and antifungal drug resistance (Ene et al., 2012).ATO1, part of the Ato protein family, was essential for the escape of fungal pathogens and macrophage lysis (Danhof & Lorenz, 2015).On the flip side, INO1 and QDR1 exhibited significant downregulation.INO1 is vital for inositol synthesis and the formation of glycophosphatidylinositol (GPI)-anchored glycolipids on the C. albicans cell surface (Chen et al., 2008).QDR1's decreased expression implicates its role in lipid remodeling, thus affecting cell signaling and attenuating virulence (Shah et al., 2014).
Subsequent GO and KEGG analyses elaborated on the functional implications of these DEGs.Upregulated genes were predominantly associated with biological processes like organic substance transport and carbohydrate metabolism.In contrast, downregulated genes were involved in oxidative stress response and homeostasis (Rivas-Solano et al., 2022).Molecular function enrichment highlighted activities such as oxidoreductase and transporter activities, suggesting enhanced pathogenicity through adaptation to scavenging reactive oxygen species (ROS) (Tian et al., 2023).
The KEGG analysis further illustrated the complex relationship between carbon source adaptation and C. albicans virulence.It revealed disruptions in metabolic pathways, including fatty acid degradation, potentially affecting the transition to the hyphal form and, thereby, the manifestation of candidiasis (Chen et al., 2020;DeJarnette et al., 2021).Additional pathways like Peroxisome, Fatty acid degradation, and Fatty acid metabolism were upregulated, indicating metabolic flexibility commonly linked to pathogenicity.
A Protein-Protein Interaction (PPI) network was constructed using the STRING database, providing a comprehensive DEG interaction view (Figure 6a).Subsequent modular analysis identified several highscoring clusters, with Cluster 1 being the most prominent (Figure 6b).Topological analysis identified key hub genes like ACS1, SDH2, and FBP1, which may be critical for the metabolic adaptation of C. albicans to malic acid (Fig. 7).
Pathways like Peroxisome, Fatty acid degradation, and Fatty acid metabolism were also upregulated, suggesting an increased reliance on fatty acids, possibly as an alternative energy source.This upregulation may contribute to the organism's metabolic flexibility, a feature commonly linked to pathogenicity.The upregulation in processes related to purine metabolism, aerobic respiration, and energy generation underscores their critical role in C. albicans' survival and potentially its increased virulence in malic acid.The observed upregulation in oxidoreductase and transmembrane transporter activities could indicate an enhanced redox balance and nutrient uptake capacity-vital for survival in hostile host tissues.
Furthermore, we constructed a PPI network using the STRING database, revealing a comprehensive view of the interactions among DEGs (Fig. 6a).Subsequent modular analyses using the MCODE algorithm identified several highscoring clusters, with Cluster 1 being the most prominent (Fig. 6b).
Utilizing topological analyses, we identified the top 10 hub genes-ACS1, SDH2, FBP1, PFK1, PYC2, ACO1, SCW4, MDH1, KGD1, and IDP2-that appear to be pivotal in the metabolic adaptation of C. albicans to malic acid, as illustrated in Figure 7.These genes have been previously linked to specific functional roles crucial for C. albicans' survival and pathogenicity.For instance, ACS1 has been associated with increased virulence (Solis et al., 2023), while SDH2 is also implicated in virulence (Bi et al., 2018).FBP1 is considered a potential target for therapeutic interventions (Ramírez & Lorenz, 2007), and PFK1 regulates a ratelimiting enzyme in glycolysis, the primary pathway for energy production in C. albicans (Kumar et al., 2020).PYC2 plays a role in oxidative stress tolerance (Chauhan et al., 2003), and ACO1 is involved in the iron homeostasis of the pathogen (Singh et al., 2011).Additionally, SCW4 is associated with drug resistance (Ene et al., 2012), MDH1 has been used as an antigen in antibodies to combat candidiasis (Shukla et al., 2021), KGD1 contributes to fluconazole resistance (Gale et al., 2020), and IDP2 is implicated in yeast autolysis (Ye et al., 2023).
Our comprehensive analysis of transcriptomic data has offered significant insights into the metabolic adaptations of C. albicans when exposed to malic acid.PPI networks revealed high-confidence interaction scores (>0.8), emphasizing the potential significance of these findings.These network results align closely with the upregulated metabolic pathways previously identified through KEGG analysis, suggesting multi-layered regulation of C. albicans' pathogenicity in a malic acid environment.Understanding these functional connections between proteins is crucial for grasping their impact on cellular processes, disease mechanisms, or potential treatment pathways.
This study primarily focuses on the molecular and functional changes induced by different carbon sources, aiming to deepen our understanding of environmental factors that may influence the pathogenicity and drug resistance of C. albicans.Such information is vital for developing targeted interventions to control Candida infections.The evidence suggests that malic acid could serve as an environmental modulator, potentially affecting the pathogenicity of C. albicans.This provides a compelling direction for future research and clinical considerations.
Specifically, there may be a need for tailored treatment strategies in environments abundant in malic acid.Moreover, this could stimulate additional research into how environmental factors affect microbial pathogenicity.While our inferences are drawn from observed changes in metabolic pathways and gene functions, they still require empirical validation.Future studies should consider employing in vivo infection models to directly assess the impact of malic acid on the severity of Candida infections.

Fig. 1 :
Fig. 1: Flow Chart of the methodology for identifying hub genes that modulate metabolic pathways of candida albicans grown in Malic acid as a carbon source.

Fig. 2 :
Fig. 2: Transcriptomic Data Visualization.(a) Principal component analysis (PCA) biplot comparing the transcriptomic profiles of C. albicans treated with glucose versus malic acid.(b) Correlation of expression levels both between and within the sample groups.

Fig. 3 :
Fig. 3: Sample Normalization in GSE223317.(a) Boxplots comparing sample expression profiles before and after normalization.(b) Mean-Variance Trends in Gene Expression.Left: Voom Transformation, showing the mean-variance relationship post-voom transformation.Right: Final Model, depicting the mean-variance trend after eBayes moderation.

Fig. 4 :
Fig. 4: DEGs identification with Malic Acid as Carbon Source: (a) Volcano plot displaying DEGs in the GSE223317 dataset.Green dots indicate upregulated genes, while blue dots represent down-regulated genes.(b) Heatmap presenting the top 100 upregulated and downregulated genes in the dataset.Highly expressed genes are shown in red, while lowly expressed genes are displayed in blue.

Fig. 7 .
Fig. 7. Illustration of the hub genes identified through MCODE, Stress, Degree, Closeness, Bottleneck, and Betweenness analyses in a heatmap.A colored block signifies the genes' involvement in the cluster or their high scores in the topological analysis.

Table 1 :
(Brandt et al., 2023)al conditions employed to study the effect of different treatments on Candida albicans (SC5314 strain) retrieved from the GSE223317 dataset(Brandt et al., 2023).

Table 2 :
The top 10 DEGs with the highest increase in expression in the GSE223317 dataset

Table 3 :
The top 10 DEGs with the highest decrease in expression in the GSE223317 dataset