Part 7: KE enrichment score analysis and benchmarking for dataset: E-MEXP-3583
The AOP project ► Key objective 2
Author: Shakira Agata
This Jupyter Notebook shows the steps for the execution of KE enrichment analysis and benchmarking to Overrepresentation Analysis(ORA) for dataset:E-MEXP-3583. This notebook is subdivided into nine sections:
- Section 1: Creation of dictKE dictionary
- Section 2: Creation of dictWP dictionary
- Section 3: Creation of KEgenes dictionary
- Section 4: Calculation of N variable
- Section 5: Comparison 1: Ag+ 24H
- Section 5.1: Calculation of n variable
- Section 5.2:Calculation of variable B and variable b
- Section 5.3: Calculation of enrichment score and hypergeometric p-value
- Section 5.4: Filtering results
- Section 5.5: Calculation of percent gene overlap
- Section 5.5.1 Creation of significant KE table
- Section 5.5.2 Significant ORA pathway table
- Section 5.5.3 Creation of for loop
- Section 5.5.4 Tabulation
- Section 5.5.5 Percent overlap calculation
- Section 6: Comparison 2: Ag+ 48H
- Section 6.1: Calculation of n variable
- Section 6.2:Calculation of variable B and variable b
- Section 6.3: Calculation of enrichment score and hypergeometric p-value
- Section 6.4: Filtering results
- Section 6.5: Calculation of percent gene overlap
- Section 6.5.1 Creation of significant KE table
- Section 6.5.2 Significant ORA pathway table
- Section 6.5.3 Creation of for loop
- Section 6.5.4 Tabulation
- Section 6.5.5 Percent overlap calculation
- Section 7: Comparison 3: AgNP 24H
- Section 7.1: Calculation of n variable
- Section 7.2:Calculation of variable B and variable b
- Section 7.3: Calculation of enrichment score and hypergeometric p-value
- Section 7.4: Filtering results
- Section 7.5: Calculation of percent gene overlap
- Section 7.5.1 Creation of significant KE table
- Section 7.5.2 Significant ORA pathway table
- Section 7.5.3 Creation of for loop
- Section 7.5.4 Tabulation
- Section 7.5.5 Percent overlap calculation
- Section 8: Comparison 4: AgNP 48H
- Section 8.1: Calculation of n variable
- Section 8.2:Calculation of variable B and variable b
- Section 8.3: Calculation of enrichment score and hypergeometric p-value
- Section 8.4: Filtering results
- Section 8.5: Calculation of percent gene overlap
- Section 8.5.1 Creation of significant KE table
- Section 8.5.2 Significant ORA pathway table
- Section 8.5.3 Creation of for loop
- Section 8.5.4 Tabulation
- Section 8.5.5 Percent overlap calculation
- Section 9: Metadata
Section 1: Creation of dictKE dictionary
In this section, the dictKE dictionary will be made which is used to retrieve the first neighbors of the key events present in the inflammatory stress response pathway AOP network.
Step 1. First, the necessary packages and inflammatory stress response pathway AOP network were loaded.
import pandas as pd
import numpy as np
from scipy.stats import hypergeom
import matplotlib.pyplot as plt
import scipy.stats as ss
import py4cytoscape as p4c
p4c.cytoscape_ping()
p4c.cytoscape_version_info()
You are connected to Cytoscape!
{'apiVersion': 'v1',
'cytoscapeVersion': '3.10.1',
'automationAPIVersion': '1.9.0',
'py4cytoscapeVersion': '1.9.0'}
network=p4c.open_session('Agata,S.-Part4 Complete Molecular inflammation-process related AOP network.cys')
Opening C:\Users\shaki\Downloads\Agata,S.-Part4 Complete Molecular inflammation-process related AOP network.cys...
Step 2. Next, the nodetables are loaded in preparation for the dictionary creation.
nodetable=p4c.get_table_columns()
dataframe_for_dictKE=pd.read_excel('Nodetable-dictKE.xlsx')
df_corrected=pd.read_excel('nodetable-dictWP.xlsx').reset_index(drop=True)
Step 3. The dataframe will now be converted into a dictionary where the keys are the IDs from the key events and the values are the titles of the molecular pathways.
completedataframe_for_dictKE=dataframe_for_dictKE[['ID (KEID)','WPtitle']].copy()
complete_dataframe_for_dictKE=completedataframe_for_dictKE.rename(columns={"ID (KEID)":"KEID"})
Step 4. The format of the dataframe will now be converted into a dictionary format.
dictKE= complete_dataframe_for_dictKE.to_dict('records')
Section 2: Creation of dictWP dictionary
In this section, the dictWP dictionary will be created. The dictWP dictionary will contain the first neighbours: genes of the individual molecular pathways mapped to the inflammatory stress response pathway AOP network.
Step 5. First, the dataframe is created in which the molecular pathways mapped to the network are filtered.
df4= nodetable[nodetable['type'] == 'Molecular pathway']
Step 6. A duplicate network will be created for which we will create filters to only contain gene and molecular pathway nodes in the network in preparation for the dictWP creation. This requires a composite filter to exclude Molecular Initiating Event (MIE) nodes, Key Event (KE) nodes and Adverse Outcome (AO) nodes.
Clonednetwork_fordictWP= p4c.clone_network()
p4c.rename_network('Cloned molecular inflammatory stress response pathway AOP network for dict WP')
{'network': 832855,
'title': 'Cloned molecular inflammatory stress response pathway AOP network for dict WP'}
MIEfilter= p4c.create_column_filter('MIE filter', 'type', 'MIE', 'CONTAINS', network='Cloned molecular inflammatory stress response pathway AOP network for dict WP')
No edges selected.
KEfilter= p4c.create_column_filter('KE filter', 'type', 'KE', 'CONTAINS',network='Cloned molecular inflammatory stress response pathway AOP network for dict WP')
No edges selected.
AOfilter= p4c.create_column_filter('AO filter', 'type', 'AO', 'CONTAINS',network='Cloned molecular inflammatory stress response pathway AOP network for dict WP')
No edges selected.
combined_MIEKEAOfilter= p4c.create_composite_filter('MIE KE AO filter', ['MIE filter','KE filter','AO filter'],type='ANY',network='Cloned molecular inflammatory stress response pathway AOP network for dict WP')
No edges selected.
Step 7. You will delete the selected filtered nodes from the composite filter to only maintain the molecular pathway nodes and gene nodes.
Deletednodes= p4c.delete_selected_nodes(network='Cloned molecular inflammatory stress response pathway AOP network for dict WP')
Step 8. A for loop will be created for the WP dictionary which contains the WP titles as the keys and the names of the genes as values. Due to settings of the get_first_neighbours function, it is not possible to retrieve the gene IDs with this function.
name_list_WP=df_corrected['name'].tolist()
dictWP = {}
for name in name_list_WP:
gene_neighbors_per_WP = p4c.get_first_neighbors(node_names=name, network= 'Cloned molecular inflammatory stress response pathway AOP network for dict WP', as_nested_list=False)
dictWP[name] = gene_neighbors_per_WP
Section 3: Creation of KEgenes dictionary
In this section, you will identify the match between the dictKE and dictWP dictionary which will allow for matching between the keys: KE ID from the dictKE to the values:genes from dictWP.
Step 9. The KE_genes_dictionary dictionary will contain the match between the dictKE and dictWP dictionary by adding the values of the dictWP dictionary if the value: WPtitle of dictKE is present in dictWP.
KE_genes_dictionary=[]
for KEID in dictKE:
WPtitle= KEID['WPtitle']
if WPtitle in dictWP:
KEID['gene'] = dictWP[WPtitle]
KE_genes_dictionary.append(KEID)
Section 4: Calculation of N variable
In this section, variable N will be calculated per individual key event.
Step 10. First, the KEgenes dictionary is manipulated so that each gene is placed on an individual row. This requires the creation of a dataframe, adjustment of the column titles and explosion of the gene column.
first_dataframe=pd.DataFrame.from_dict(KE_genes_dictionary)
df5=df4.rename(columns={'name':'WPtitle'})
first_dataframe1=pd.merge(first_dataframe, df5, on='WPtitle')
second_dataframe=first_dataframe1.explode('gene')
second_dataframe1 = second_dataframe.drop(columns=['selected','AverageShortestPathLength','BetweennessCentrality','ClosenessCentrality','ClusteringCoefficient','group','type','Association type','CTL.Ext','CTL.Type','CTL.PathwayName','CTL.label','CTL.PathwayID','CTL.GeneName','CTL.GeneID','Eccentricity','EdgeCount','Indegree','IsSingleNode','NeighborhoodConnectivity','Outdegree','PartnerOfMultiEdgedNodePairs','SelfLoops','Stress','id','SUID'], axis=1)
second_dataframe1_reordered = second_dataframe.loc[:, ['KEID', 'WPtitle', 'shared name','gene']]
third_dataframe=second_dataframe1_reordered.rename(columns={'shared name':'ID'})
Step 11. The gene IDs that belong to gene symbols are added to complete the dataframe and merge this dataframe to the previous: third_dataframe. This will allow for a dataframe that contains all needed columns: KEID, WPtitle, WPID, gene symbol and gene ID.
df6= nodetable[nodetable['CTL.Type'] == 'gene']
df7=df6.rename(columns={'shared name':'gene'})
df8=df7.drop(columns=['name','selected','AverageShortestPathLength','BetweennessCentrality','ClosenessCentrality','ClusteringCoefficient','group','type','Association type','CTL.Ext','CTL.Type','CTL.PathwayName','CTL.label','CTL.PathwayID','CTL.GeneName','Eccentricity','EdgeCount','Indegree','IsSingleNode','NeighborhoodConnectivity','Outdegree','PartnerOfMultiEdgedNodePairs','SelfLoops','Stress','id','SUID'], axis=1)
mergeddataframe_gene=pd.merge(third_dataframe, df8, on='gene')
mergeddataframe_final=mergeddataframe_gene.rename(columns={'CTL.GeneID':'Entrez.Gene'})
Step 12. The following for loop will be run for the calculation of the N variable. This for loop iterates over each row of the dataframe and will count the number of genes belonging to individual Key Events that are unique.
variable_N_dictionary_count= {}
for index, row in mergeddataframe_final.iterrows():
unique_KE = row['KEID']
gene = row['Entrez.Gene']
if unique_KE not in variable_N_dictionary_count:
variable_N_dictionary_count[unique_KE] = 1
else:
variable_N_dictionary_count[unique_KE] += 1
print("The total is: ")
Step 13. The output of the dictionary will be converted into a dataframe and merged to the mergeddataframe_final dataframe to add the results into a separate column.
fourth_dataframe=pd.DataFrame.from_dict(variable_N_dictionary_count,orient='index')
df_reset = fourth_dataframe.reset_index()
df_reset.columns = ['KEID', 'N']
merged_dataframe= pd.merge(mergeddataframe_final, df_reset, on='KEID')
mergeddataframe=merged_dataframe.rename(columns={'ID':'WPID','gene':'Gene.Symbol'})
Section 5. Comparison 1: Ag+ 24H
Section 5.1 Calculation of n variable
In this section, variable n will be calculated for comparison 1.
Step 14. The table containing the differential expressed genes to control is loaded with the filter for significance.
Ag24H_DEG= pd.read_csv('topTable_Ag._.1.3_24 - H2O.control_.0.0_24.tsv',sep='\t')
Ag_24H_DEG= Ag24H_DEG[Ag24H_DEG['adj. p-value'] < 0.05]
Ag_24H_DEG = Ag_24H_DEG.copy()
Ag_24H_DEG.rename(columns={Ag_24H_DEG.columns[0]: 'Entrez.Gene'}, inplace=True)
Ag_24H_DEG['Entrez.Gene'] = Ag_24H_DEG['Entrez.Gene'].astype(str)
Step 15. Here, the results of the DEG table are integrated into the mergeddataframe dataframe. This is followed by adjustment of the dataframe columns to remove non-relevant columns.
merged_dataframe_DEG= pd.merge(mergeddataframe,Ag_24H_DEG, on='Entrez.Gene')
mergeddataframeDEG= merged_dataframe_DEG.drop(['meanExpr'], axis=1)
Step 16. The following for loop for the key events is run to retrieve the n variable. It is comparable to the for loop of N, but adds a condition to check for significance of genes by p adjusted value being smaller than 0.05.
variable_n_dictionary_count= {}
for index, row in mergeddataframeDEG.iterrows():
unique_KE = row['KEID']
gene_expression_value = row['adj. p-value']
if gene_expression_value < 0.05:
if unique_KE not in variable_n_dictionary_count:
variable_n_dictionary_count[unique_KE] = 1
else:
variable_n_dictionary_count[unique_KE] += 1
print("The total number of significant genes: ")
Step 17. The output of the n variable dictionary is saved as a dataframe and integrated as a separate column into a dataframe.
n_variable_dataframe=pd.DataFrame.from_dict(variable_n_dictionary_count,orient='index')
n_variable_dataframe_reset = n_variable_dataframe.reset_index()
n_variable_dataframe_reset.columns = ['KEID', 'n']
merged_dataframe2= pd.merge(mergeddataframeDEG, n_variable_dataframe_reset, on='KEID')
Section 5.2. Calculation of variable B and variable b.
In this section, variable B and variable b are calculated.
Step 18. Variable B is calculated by taking the length of the dataframe which includes all genes in 1 DEG table.
B=len(Ag24H_DEG.index)
B
20518
Step 19. Variable b is calculated by taking the length of the dataframe which includes all genes in 1 DEG table with the condition for significance.
Ag_24H_DEG_filtered=Ag_24H_DEG[Ag_24H_DEG['adj. p-value'] < 0.05]
b=len(Ag_24H_DEG_filtered)
b
127
Section 5.3. Calculation of enrichment score and hypergeometric p-value
In this section, the enrichment score and hypergeometric p-value will be calculated. This requires the four variables of the enrichment score per KE for which the formula will be applied to and stored in an additional dataframe.
Step 20. The final dataframe will be created that contains the KEID and the four variables: variable N, variable n, variable B and variable b.
Final_dataframe_ES= merged_dataframe2.loc[:, ['KEID','N','n']]
Final_dataframe_ES['B']=pd.Series([20518 for x in range(len(Final_dataframe_ES.index))])
Final_dataframe_ES['b']=pd.Series([127 for x in range(len(Final_dataframe_ES.index))])
Final_Dataframe_ES=Final_dataframe_ES.drop_duplicates(subset=['KEID'],keep='first')
Final_Dataframe_ES.reset_index(drop=True,inplace=True)
Copy_Final_DataFrame_ES=Final_Dataframe_ES.copy()
Step 21. The follow for loop will be used to calculate the enrichment score for individual key events and the results will be saved as a separate column into the dataframe.
def calculate_Enrichment_Score(row):
return f"{(row['n']/row['N'])/(row['b']/row['B'])}"
Copy_Final_DataFrame_ES.loc[:,'Enrichmentscore']= Copy_Final_DataFrame_ES.apply(calculate_Enrichment_Score,axis=1)
Copy_Final_DataFrame_ES
KEID | N | n | B | b | Enrichmentscore | |
---|---|---|---|---|---|---|
0 | https://identifiers.org/aop.events/1495 | 253 | 1 | 20518 | 127 | 0.6385733403877875 |
1 | https://identifiers.org/aop.events/1668 | 156 | 1 | 20518 | 127 | 1.0356349687058348 |
2 | https://identifiers.org/aop.events/244 | 417 | 1 | 20518 | 127 | 0.38743178685398133 |
3 | https://identifiers.org/aop.events/41 | 275 | 1 | 20518 | 127 | 0.5874874731567645 |
4 | https://identifiers.org/aop.events/1539 | 170 | 1 | 20518 | 127 | 0.9503473830477073 |
5 | https://identifiers.org/aop.events/618 | 240 | 1 | 20518 | 127 | 0.6731627296587926 |
6 | https://identifiers.org/aop.events/1497 | 528 | 6 | 20518 | 127 | 1.835898353614889 |
7 | https://identifiers.org/aop.events/1115 | 34 | 3 | 20518 | 127 | 14.25521074571561 |
8 | https://identifiers.org/aop.events/1917 | 166 | 1 | 20518 | 127 | 0.9732473199886159 |
9 | https://identifiers.org/aop.events/1633 | 1056 | 12 | 20518 | 127 | 1.835898353614889 |
10 | https://identifiers.org/aop.events/1392 | 102 | 9 | 20518 | 127 | 14.25521074571561 |
11 | https://identifiers.org/aop.events/1582 | 51 | 2 | 20518 | 127 | 6.335649220318048 |
12 | https://identifiers.org/aop.events/1896 | 205 | 1 | 20518 | 127 | 0.7880929517956596 |
13 | https://identifiers.org/aop.events/265 | 268 | 3 | 20518 | 127 | 1.8084968856504875 |
14 | https://identifiers.org/aop.events/1750 | 528 | 6 | 20518 | 127 | 1.835898353614889 |
15 | https://identifiers.org/aop.events/1848 | 195 | 1 | 20518 | 127 | 0.8285079749646679 |
16 | https://identifiers.org/aop.events/890 | 34 | 3 | 20518 | 127 | 14.25521074571561 |
17 | https://identifiers.org/aop.events/149 | 1056 | 12 | 20518 | 127 | 1.835898353614889 |
18 | https://identifiers.org/aop.events/1579 | 353 | 2 | 20518 | 127 | 0.9153487542102563 |
19 | https://identifiers.org/aop.events/249 | 34 | 3 | 20518 | 127 | 14.25521074571561 |
20 | https://identifiers.org/aop.events/288 | 51 | 1 | 20518 | 127 | 3.167824610159024 |
21 | https://identifiers.org/aop.events/209 | 617 | 5 | 20518 | 127 | 1.3092305925292562 |
22 | https://identifiers.org/aop.events/1945 | 1218 | 4 | 20518 | 127 | 0.5305716095832849 |
23 | https://identifiers.org/aop.events/1087 | 528 | 6 | 20518 | 127 | 1.835898353614889 |
24 | https://identifiers.org/aop.events/1538 | 34 | 3 | 20518 | 127 | 14.25521074571561 |
25 | https://identifiers.org/aop.events/341 | 10 | 1 | 20518 | 127 | 16.155905511811024 |
26 | https://identifiers.org/aop.events/1090 | 459 | 2 | 20518 | 127 | 0.7039610244797833 |
27 | https://identifiers.org/aop.events/352 | 398 | 3 | 20518 | 127 | 1.2177818224983183 |
Step 22. The following for loop will be used to calculate the hypergeometric p-value for individual Key Events and save the result as a separate column into the dataframe. This requires some in between steps for manipulation of the dataframe.
p_value_dataframe=[]
for index, row in Copy_Final_DataFrame_ES.iterrows():
M = row['B']
n = row['b']
N = row['N']
k = row['n']
hpd = ss.hypergeom(M, n, N)
p = hpd.pmf(k)
p_value_dataframe.append(p)
Hypergeometricpvalue_dataframe=pd.DataFrame(p_value_dataframe)
Hypergeometricpvalue_dataframe.columns= ['Hypergeometric p-value']
merged_finaltable=pd.concat([Copy_Final_DataFrame_ES,Hypergeometricpvalue_dataframe],axis=1)
Section 5.4. Filtering the results for significant KEs
In this section, the results will be filtered to only include significant KEs. Significant KEs have an enrichment score above 1 and a hypergeometric p-value below 0.05.
Step 23. Lastly, the results are filtered to showcase the significant KEs for the comparison 1.
filteredversion= merged_finaltable[(merged_finaltable['Enrichmentscore']>str(1))& (merged_finaltable['Hypergeometric p-value'] < 0.05)]
filteredversion
KEID | N | n | B | b | Enrichmentscore | Hypergeometric p-value | |
---|---|---|---|---|---|---|---|
7 | https://identifiers.org/aop.events/1115 | 34 | 3 | 20518 | 127 | 14.25521074571561 | 1.148290e-03 |
9 | https://identifiers.org/aop.events/1633 | 1056 | 12 | 20518 | 127 | 1.835898353614889 | 1.689462e-02 |
10 | https://identifiers.org/aop.events/1392 | 102 | 9 | 20518 | 127 | 14.25521074571561 | 1.337105e-08 |
11 | https://identifiers.org/aop.events/1582 | 51 | 2 | 20518 | 127 | 6.335649220318048 | 3.591109e-02 |
16 | https://identifiers.org/aop.events/890 | 34 | 3 | 20518 | 127 | 14.25521074571561 | 1.148290e-03 |
17 | https://identifiers.org/aop.events/149 | 1056 | 12 | 20518 | 127 | 1.835898353614889 | 1.689462e-02 |
19 | https://identifiers.org/aop.events/249 | 34 | 3 | 20518 | 127 | 14.25521074571561 | 1.148290e-03 |
24 | https://identifiers.org/aop.events/1538 | 34 | 3 | 20518 | 127 | 14.25521074571561 | 1.148290e-03 |
# Ensure numeric types
filteredversion['Hypergeometric p-value'] = pd.to_numeric(filteredversion['Hypergeometric p-value'], errors='coerce')
filteredversion['Enrichmentscore'] = pd.to_numeric(filteredversion['Enrichmentscore'], errors='coerce')
filteredversion['combined_score'] = -np.log10(filteredversion['Hypergeometric p-value']) * filteredversion['Enrichmentscore']
# Sort by combined score (highest first)
C1_sorted = filteredversion.sort_values(by='combined_score', ascending=False)
# Show top rows
C1_sorted.to_excel('ConsistentKE-Ag+-24h.xlsx')
C:\Users\shaki\AppData\Local\Temp\ipykernel_16388\3901049641.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
filteredversion['Hypergeometric p-value'] = pd.to_numeric(filteredversion['Hypergeometric p-value'], errors='coerce')
C:\Users\shaki\AppData\Local\Temp\ipykernel_16388\3901049641.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
filteredversion['Enrichmentscore'] = pd.to_numeric(filteredversion['Enrichmentscore'], errors='coerce')
C:\Users\shaki\AppData\Local\Temp\ipykernel_16388\3901049641.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
filteredversion['combined_score'] = -np.log10(filteredversion['Hypergeometric p-value']) * filteredversion['Enrichmentscore']
Section 5.5. Calculation of percent gene overlap to ORA
Section 5.5.1 Creation of the significant KEs table
In this section, you merge the dataframes to retrieve the genes connected to only the significant KEs.
Step 24. The significant KE table is created using the significan KEs from the previous merggeddataframe_final.
significantKEID_genetable=mergeddataframe_final[(mergeddataframe_final['KEID'] =='https://identifiers.org/aop.events/1115') | (mergeddataframe_final['KEID'] == 'https://identifiers.org/aop.events/1633') |(mergeddataframe_final['KEID'] =='https://identifiers.org/aop.events/1392')| (mergeddataframe_final['KEID'] =='https://identifiers.org/aop.events/1582')|(mergeddataframe_final['KEID'] =='https://identifiers.org/aop.events/890') |(mergeddataframe_final['KEID']=='https://identifiers.org/aop.events/149')|(mergeddataframe_final['KEID']=='https://identifiers.org/aop.events/249')| (mergeddataframe_final['KEID']=='https://identifiers.org/aop.events/1538')]
significantKEIDgenetable=significantKEID_genetable.drop(columns={'WPtitle','ID'})
Section 5.5.2 Significant ORA pathway table plus splitting
In this section, the significant ORA pathway table is created.
Step 25. The significant ORA pathway table is created using the significant enriched patwhays identified from the ORA analysis. This requires data manipulation to restructure the table in a way that the individual genes for the enriched pathways are placed on individual rows.
file=open("C:/Users/shaki/Downloads/downloads/ORA_output_tabel/WikiPathways_2024_Human.human.enrichr.reports.txt","r")
datafile_ORA = pd.read_csv('WikiPathways_2024_Human.human.enrichr.reports.txt', sep='\t')
datafileORA=pd.DataFrame(datafile_ORA)
filtereddatafileORA=datafileORA[datafileORA['Adjusted P-value'] < 0.05]
# Make sure 'Combined Score' is numeric
datafileORA['Combined Score'] = pd.to_numeric(datafileORA['Combined Score'], errors='coerce')
# Sort by 'Combined Score' in descending order
ranked_df = datafileORA.sort_values(by='Combined Score', ascending=False)
# (Optional) Save to Excel
ranked_df.to_excel('Ag24H-ORAtable-thesis(EMEXP3583).xlsx', index=False)
dropped_datafileORA_df=filtereddatafileORA.drop(['Adjusted P-value','Odds Ratio','Old P-value','Gene_set','P-value','Old adjusted P-value','Combined Score'],axis=1)
droppeddatafileORAdf=dropped_datafileORA_df.copy()
droppeddatafileORAdf['Genes']= droppeddatafileORAdf['Genes'].replace({';':','},regex=True)
df_ORApathwaytable=droppeddatafileORAdf.copy()
df_ORApathwaytable['Genes'] = df_ORApathwaytable['Genes'].astype(str)
df_ORApathwaytable['Genes'] = df_ORApathwaytable['Genes'].str.split(',')
exploded_df_ORApathwaytable = df_ORApathwaytable.explode('Genes', ignore_index=True)
Section 5.5.3 For loop to get overlapping genes
In this section, the number of overlapping genes between the significant enrichment score-based Key Events and enriched pathways from ORA are calculated.
Step 26. Next, we create two sets by converting the significant KE table and ora pathway table into dictionaries where the values of the genes are grouped together per key.
ORA_gene_sets = exploded_df_ORApathwaytable.groupby('Term')['Genes'].apply(set).to_dict()
SignificantKE_gene_sets = significantKEIDgenetable.groupby('KEID')['gene'].apply(set).to_dict()
overlapping_genes_betweenORA_and_significantKEs = {}
for term, ORA_genes in ORA_gene_sets.items():
for KEID, KEID_genes in SignificantKE_gene_sets.items():
overlap = ORA_genes.intersection(KEID_genes)
print(f"{term} x {KEID}: {len(overlap)} overlaps")
overlapping_genes_betweenORA_and_significantKEs[(term, KEID)] = {
'overlapping genes': overlap,
'number of genes that overlap': len(overlap)
}
if overlapping_genes_betweenORA_and_significantKEs:
print("\ntitle of Overlapping Gene(s) and the number between enriched pathways from ORA and significant KEs:")
for (term, KEID), result in overlapping_genes_betweenORA_and_significantKEs.items():
print(f"Term: {term}, KEID: {KEID}, Title of overlapping gene(s): {result['overlapping genes']}, number: {result['number of genes that overlap']}")
else:
print("No overlapping genes")
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1115: 1 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1392: 1 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/149: 0 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1538: 1 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1582: 0 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1633: 0 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/249: 1 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/890: 1 overlaps
Gastric Cancer Network 1 WP2361 x https://identifiers.org/aop.events/1115: 0 overlaps
Gastric Cancer Network 1 WP2361 x https://identifiers.org/aop.events/1392: 0 overlaps
Gastric Cancer Network 1 WP2361 x https://identifiers.org/aop.events/149: 0 overlaps
Gastric Cancer Network 1 WP2361 x https://identifiers.org/aop.events/1538: 0 overlaps
Gastric Cancer Network 1 WP2361 x https://identifiers.org/aop.events/1582: 0 overlaps
Gastric Cancer Network 1 WP2361 x https://identifiers.org/aop.events/1633: 0 overlaps
Gastric Cancer Network 1 WP2361 x https://identifiers.org/aop.events/249: 0 overlaps
Gastric Cancer Network 1 WP2361 x https://identifiers.org/aop.events/890: 0 overlaps
Gastric Cancer Network 2 WP2363 x https://identifiers.org/aop.events/1115: 0 overlaps
Gastric Cancer Network 2 WP2363 x https://identifiers.org/aop.events/1392: 0 overlaps
Gastric Cancer Network 2 WP2363 x https://identifiers.org/aop.events/149: 0 overlaps
Gastric Cancer Network 2 WP2363 x https://identifiers.org/aop.events/1538: 0 overlaps
Gastric Cancer Network 2 WP2363 x https://identifiers.org/aop.events/1582: 0 overlaps
Gastric Cancer Network 2 WP2363 x https://identifiers.org/aop.events/1633: 0 overlaps
Gastric Cancer Network 2 WP2363 x https://identifiers.org/aop.events/249: 0 overlaps
Gastric Cancer Network 2 WP2363 x https://identifiers.org/aop.events/890: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1115: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1392: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/149: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1538: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1582: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1633: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/249: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/890: 0 overlaps
Melatonin Metabolism And Effects WP3298 x https://identifiers.org/aop.events/1115: 2 overlaps
Melatonin Metabolism And Effects WP3298 x https://identifiers.org/aop.events/1392: 2 overlaps
Melatonin Metabolism And Effects WP3298 x https://identifiers.org/aop.events/149: 0 overlaps
Melatonin Metabolism And Effects WP3298 x https://identifiers.org/aop.events/1538: 2 overlaps
Melatonin Metabolism And Effects WP3298 x https://identifiers.org/aop.events/1582: 0 overlaps
Melatonin Metabolism And Effects WP3298 x https://identifiers.org/aop.events/1633: 0 overlaps
Melatonin Metabolism And Effects WP3298 x https://identifiers.org/aop.events/249: 2 overlaps
Melatonin Metabolism And Effects WP3298 x https://identifiers.org/aop.events/890: 2 overlaps
Oxidative Stress Response WP408 x https://identifiers.org/aop.events/1115: 3 overlaps
Oxidative Stress Response WP408 x https://identifiers.org/aop.events/1392: 3 overlaps
Oxidative Stress Response WP408 x https://identifiers.org/aop.events/149: 0 overlaps
Oxidative Stress Response WP408 x https://identifiers.org/aop.events/1538: 3 overlaps
Oxidative Stress Response WP408 x https://identifiers.org/aop.events/1582: 0 overlaps
Oxidative Stress Response WP408 x https://identifiers.org/aop.events/1633: 0 overlaps
Oxidative Stress Response WP408 x https://identifiers.org/aop.events/249: 3 overlaps
Oxidative Stress Response WP408 x https://identifiers.org/aop.events/890: 3 overlaps
Retinoblastoma Gene In Cancer WP2446 x https://identifiers.org/aop.events/1115: 0 overlaps
Retinoblastoma Gene In Cancer WP2446 x https://identifiers.org/aop.events/1392: 0 overlaps
Retinoblastoma Gene In Cancer WP2446 x https://identifiers.org/aop.events/149: 0 overlaps
Retinoblastoma Gene In Cancer WP2446 x https://identifiers.org/aop.events/1538: 0 overlaps
Retinoblastoma Gene In Cancer WP2446 x https://identifiers.org/aop.events/1582: 0 overlaps
Retinoblastoma Gene In Cancer WP2446 x https://identifiers.org/aop.events/1633: 0 overlaps
Retinoblastoma Gene In Cancer WP2446 x https://identifiers.org/aop.events/249: 0 overlaps
Retinoblastoma Gene In Cancer WP2446 x https://identifiers.org/aop.events/890: 0 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/1115: 1 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/1392: 1 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/149: 0 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/1538: 1 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/1582: 0 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/1633: 0 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/249: 1 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/890: 1 overlaps
Zinc Homeostasis WP3529 x https://identifiers.org/aop.events/1115: 1 overlaps
Zinc Homeostasis WP3529 x https://identifiers.org/aop.events/1392: 1 overlaps
Zinc Homeostasis WP3529 x https://identifiers.org/aop.events/149: 0 overlaps
Zinc Homeostasis WP3529 x https://identifiers.org/aop.events/1538: 1 overlaps
Zinc Homeostasis WP3529 x https://identifiers.org/aop.events/1582: 0 overlaps
Zinc Homeostasis WP3529 x https://identifiers.org/aop.events/1633: 0 overlaps
Zinc Homeostasis WP3529 x https://identifiers.org/aop.events/249: 1 overlaps
Zinc Homeostasis WP3529 x https://identifiers.org/aop.events/890: 1 overlaps
title of Overlapping Gene(s) and the number between enriched pathways from ORA and significant KEs:
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): {'MT1X'}, number: 1
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'MT1X'}, number: 1
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): set(), number: 0
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): {'MT1X'}, number: 1
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): set(), number: 0
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): {'MT1X'}, number: 1
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): {'MT1X'}, number: 1
Term: Gastric Cancer Network 1 WP2361, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 1 WP2361, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 1 WP2361, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 1 WP2361, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 1 WP2361, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 1 WP2361, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 1 WP2361, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 1 WP2361, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 2 WP2363, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 2 WP2363, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 2 WP2363, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 2 WP2363, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 2 WP2363, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 2 WP2363, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 2 WP2363, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): set(), number: 0
Term: Gastric Cancer Network 2 WP2363, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): set(), number: 0
Term: Melatonin Metabolism And Effects WP3298, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): {'CYP1A1', 'MAOA'}, number: 2
Term: Melatonin Metabolism And Effects WP3298, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'CYP1A1', 'MAOA'}, number: 2
Term: Melatonin Metabolism And Effects WP3298, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): set(), number: 0
Term: Melatonin Metabolism And Effects WP3298, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): {'CYP1A1', 'MAOA'}, number: 2
Term: Melatonin Metabolism And Effects WP3298, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Melatonin Metabolism And Effects WP3298, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): set(), number: 0
Term: Melatonin Metabolism And Effects WP3298, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): {'CYP1A1', 'MAOA'}, number: 2
Term: Melatonin Metabolism And Effects WP3298, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): {'CYP1A1', 'MAOA'}, number: 2
Term: Oxidative Stress Response WP408, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): {'CYP1A1', 'MAOA', 'MT1X'}, number: 3
Term: Oxidative Stress Response WP408, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'CYP1A1', 'MAOA', 'MT1X'}, number: 3
Term: Oxidative Stress Response WP408, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): set(), number: 0
Term: Oxidative Stress Response WP408, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): {'CYP1A1', 'MAOA', 'MT1X'}, number: 3
Term: Oxidative Stress Response WP408, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Oxidative Stress Response WP408, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): set(), number: 0
Term: Oxidative Stress Response WP408, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): {'CYP1A1', 'MAOA', 'MT1X'}, number: 3
Term: Oxidative Stress Response WP408, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): {'CYP1A1', 'MAOA', 'MT1X'}, number: 3
Term: Retinoblastoma Gene In Cancer WP2446, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): set(), number: 0
Term: Retinoblastoma Gene In Cancer WP2446, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Retinoblastoma Gene In Cancer WP2446, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): set(), number: 0
Term: Retinoblastoma Gene In Cancer WP2446, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): set(), number: 0
Term: Retinoblastoma Gene In Cancer WP2446, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Retinoblastoma Gene In Cancer WP2446, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): set(), number: 0
Term: Retinoblastoma Gene In Cancer WP2446, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): set(), number: 0
Term: Retinoblastoma Gene In Cancer WP2446, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): set(), number: 0
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): {'CYP1A1'}, number: 1
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'CYP1A1'}, number: 1
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): set(), number: 0
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): {'CYP1A1'}, number: 1
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): set(), number: 0
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): {'CYP1A1'}, number: 1
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): {'CYP1A1'}, number: 1
Term: Zinc Homeostasis WP3529, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): {'MT1X'}, number: 1
Term: Zinc Homeostasis WP3529, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'MT1X'}, number: 1
Term: Zinc Homeostasis WP3529, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): set(), number: 0
Term: Zinc Homeostasis WP3529, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): {'MT1X'}, number: 1
Term: Zinc Homeostasis WP3529, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Zinc Homeostasis WP3529, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): set(), number: 0
Term: Zinc Homeostasis WP3529, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): {'MT1X'}, number: 1
Term: Zinc Homeostasis WP3529, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): {'MT1X'}, number: 1
Section 5.5.4 Tabulation gene overlap
In this section, a table is created that contains the number of overlapping genes and number of total genes in preparation for section 5.5.5.
final_geneoverlaptable_AG_24H=pd.DataFrame.from_dict(overlapping_genes_betweenORA_and_significantKEs,orient='index')
Section 5.5.5 Percent overlap calculation
In this section, the percent overlap for the genesets are calculated.
Step 27. Lastly, we calculate the percent overlap and add the result as a column to the dataframe. This is first done by running a for loop to calculate the total number of genes belonging to the enriched pathways of ORA.
variable_count= {}
for index, row in exploded_df_ORApathwaytable.iterrows():
unique_KE = row['Term']
gene_expression_value = row['Genes']
if unique_KE not in variable_count:
variable_count[unique_KE] = 1
else:
variable_count[unique_KE] += 1
print("The total number of genes: ")
print(variable_count)
The total number of genes:
{'Zinc Homeostasis WP3529': 10, 'Copper Homeostasis WP3286': 8, 'Gastric Cancer Network 1 WP2361': 4, 'Vitamin D Receptor Pathway WP2877': 7, 'Gastric Cancer Network 2 WP2363': 3, 'Glucocorticoid Receptor Pathway WP2880': 4, 'Oxidative Stress Response WP408': 3, 'Melatonin Metabolism And Effects WP3298': 3, 'Retinoblastoma Gene In Cancer WP2446': 4}
Step 28. The result is converted into a dataframe and added to the final dataframe.
variable_count_df=pd.DataFrame.from_dict(variable_count,orient='index')
reset_variable_count_df = variable_count_df.reset_index()
Reset_variable_count_df=reset_variable_count_df.copy()
Reset_variable_count_df.columns = ['Term', 'Total number of genes']
Genesetoverlaptable_AG24H=final_geneoverlaptable_AG_24H.reset_index(level=[1])
Genesetoverlaptable_AG24h=Genesetoverlaptable_AG24H.copy()
Genesetoverlaptable_AG24h.insert(0, "Total number of genes", [10,10,10,10,10,10,10,10,8,8,8,8,8,8,8,8,8,4,4,4,4,4,4,4,4,4,7,7,7,7,7,7,7,7,7,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4])
table1=Genesetoverlaptable_AG24h.copy()
def calculate_Genesetoverlap_Score(row):
return f"{(row['number of genes that overlap']/row['Total number of genes'])*100}"
table1.loc[:,'Percent geneset overlap']= table1.apply(calculate_Genesetoverlap_Score, axis=1)
table1.to_excel('geneoverlap-calculation-Ag24h.xlsx')
Section 6. Comparison 2: Ag+ 48H
In this section, Steps 14 to step 28 are repeated for comparison 2.
Section 6.1 Calculation of n variable
Step 29. The table containing the differential expressed genes for comparison 2 is loaded with the filter for significance.
Ag48HDEG= pd.read_csv('topTable_Ag._.1.3_48 - H2O.control_.0.0_48.tsv',sep='\t')
Ag_48H_DEG= Ag48HDEG[Ag48HDEG['adj. p-value'] < 0.05]
Ag48H_DEG = Ag_48H_DEG.copy()
Ag48H_DEG.rename(columns={Ag48H_DEG.columns[0]: 'Entrez.Gene'}, inplace=True)
Ag48H_DEG['Entrez.Gene'] = Ag48H_DEG['Entrez.Gene'].astype(str)
Step 30. The results of the DEG table are next integrated into the mergeddataframe dataframe. This is followed by adjustment of the dataframe columns to remove non-relevant columns.
merged_dataframe_DEG_48h= pd.merge(mergeddataframe, Ag48H_DEG, on='Entrez.Gene')
Step 31. The following for loop for the key events is run to retrieve the n variable. It is comparable to the for loop of N, but adds a condition to check for significance of genes by p adjusted value being smaller than 0.05.
variable_n_dictionary_count2= {}
for index, row in merged_dataframe_DEG_48h.iterrows():
unique_KE = row['KEID']
gene_expression_value = row['adj. p-value']
if gene_expression_value < 0.05:
if unique_KE not in variable_n_dictionary_count2:
variable_n_dictionary_count2[unique_KE] = 1
else:
variable_n_dictionary_count2[unique_KE] += 1
print("The total number of significant genes: ")
print(variable_n_dictionary_count2)
The total number of significant genes:
{'https://identifiers.org/aop.events/486': 2, 'https://identifiers.org/aop.events/875': 2, 'https://identifiers.org/aop.events/1495': 2, 'https://identifiers.org/aop.events/1668 ': 2, 'https://identifiers.org/aop.events/244 ': 2, 'https://identifiers.org/aop.events/1814 ': 2, 'https://identifiers.org/aop.events/1270': 1, 'https://identifiers.org/aop.events/457': 2, 'https://identifiers.org/aop.events/188': 1, 'https://identifiers.org/aop.events/618': 2, 'https://identifiers.org/aop.events/2006': 1, 'https://identifiers.org/aop.events/1497': 3, 'https://identifiers.org/aop.events/1669': 1, 'https://identifiers.org/aop.events/202': 2, 'https://identifiers.org/aop.events/1633': 6, 'https://identifiers.org/aop.events/1815': 2, 'https://identifiers.org/aop.events/386': 2, 'https://identifiers.org/aop.events/1496': 4, 'https://identifiers.org/aop.events/68': 2, 'https://identifiers.org/aop.events/1493': 8, 'https://identifiers.org/aop.events/265': 2, 'https://identifiers.org/aop.events/1750': 3, 'https://identifiers.org/aop.events/1848': 3, 'https://identifiers.org/aop.events/149': 6, 'https://identifiers.org/aop.events/1579': 3, 'https://identifiers.org/aop.events/209': 4, 'https://identifiers.org/aop.events/1498': 2, 'https://identifiers.org/aop.events/1500': 2, 'https://identifiers.org/aop.events/1488': 2, 'https://identifiers.org/aop.events/52': 2, 'https://identifiers.org/aop.events/484': 2, 'https://identifiers.org/aop.events/388': 2, 'https://identifiers.org/aop.events/1945': 4, 'https://identifiers.org/aop.events/2012': 2, 'https://identifiers.org/aop.events/1818': 2, 'https://identifiers.org/aop.events/1585': 1, 'https://identifiers.org/aop.events/1087': 3, 'https://identifiers.org/aop.events/195': 2, 'https://identifiers.org/aop.events/1090': 2, 'https://identifiers.org/aop.events/1841': 1}
Step 32. The output of the n variable dictionary is saved as a dataframe and integrated as a separate column into a dataframe.
n_variable_dataframe2=pd.DataFrame.from_dict(variable_n_dictionary_count2,orient='index')
n_variable_dataframe2
0 | |
---|---|
https://identifiers.org/aop.events/486 | 2 |
https://identifiers.org/aop.events/875 | 2 |
https://identifiers.org/aop.events/1495 | 2 |
https://identifiers.org/aop.events/1668 | 2 |
https://identifiers.org/aop.events/244 | 2 |
https://identifiers.org/aop.events/1814 | 2 |
https://identifiers.org/aop.events/1270 | 1 |
https://identifiers.org/aop.events/457 | 2 |
https://identifiers.org/aop.events/188 | 1 |
https://identifiers.org/aop.events/618 | 2 |
https://identifiers.org/aop.events/2006 | 1 |
https://identifiers.org/aop.events/1497 | 3 |
https://identifiers.org/aop.events/1669 | 1 |
https://identifiers.org/aop.events/202 | 2 |
https://identifiers.org/aop.events/1633 | 6 |
https://identifiers.org/aop.events/1815 | 2 |
https://identifiers.org/aop.events/386 | 2 |
https://identifiers.org/aop.events/1496 | 4 |
https://identifiers.org/aop.events/68 | 2 |
https://identifiers.org/aop.events/1493 | 8 |
https://identifiers.org/aop.events/265 | 2 |
https://identifiers.org/aop.events/1750 | 3 |
https://identifiers.org/aop.events/1848 | 3 |
https://identifiers.org/aop.events/149 | 6 |
https://identifiers.org/aop.events/1579 | 3 |
https://identifiers.org/aop.events/209 | 4 |
https://identifiers.org/aop.events/1498 | 2 |
https://identifiers.org/aop.events/1500 | 2 |
https://identifiers.org/aop.events/1488 | 2 |
https://identifiers.org/aop.events/52 | 2 |
https://identifiers.org/aop.events/484 | 2 |
https://identifiers.org/aop.events/388 | 2 |
https://identifiers.org/aop.events/1945 | 4 |
https://identifiers.org/aop.events/2012 | 2 |
https://identifiers.org/aop.events/1818 | 2 |
https://identifiers.org/aop.events/1585 | 1 |
https://identifiers.org/aop.events/1087 | 3 |
https://identifiers.org/aop.events/195 | 2 |
https://identifiers.org/aop.events/1090 | 2 |
https://identifiers.org/aop.events/1841 | 1 |
n_variable_dataframe_reset2 = n_variable_dataframe2.reset_index()
n_variable_dataframe_reset2.columns = ['KEID', 'n']
n_variable_dataframe_reset2
KEID | n | |
---|---|---|
0 | https://identifiers.org/aop.events/486 | 2 |
1 | https://identifiers.org/aop.events/875 | 2 |
2 | https://identifiers.org/aop.events/1495 | 2 |
3 | https://identifiers.org/aop.events/1668 | 2 |
4 | https://identifiers.org/aop.events/244 | 2 |
5 | https://identifiers.org/aop.events/1814 | 2 |
6 | https://identifiers.org/aop.events/1270 | 1 |
7 | https://identifiers.org/aop.events/457 | 2 |
8 | https://identifiers.org/aop.events/188 | 1 |
9 | https://identifiers.org/aop.events/618 | 2 |
10 | https://identifiers.org/aop.events/2006 | 1 |
11 | https://identifiers.org/aop.events/1497 | 3 |
12 | https://identifiers.org/aop.events/1669 | 1 |
13 | https://identifiers.org/aop.events/202 | 2 |
14 | https://identifiers.org/aop.events/1633 | 6 |
15 | https://identifiers.org/aop.events/1815 | 2 |
16 | https://identifiers.org/aop.events/386 | 2 |
17 | https://identifiers.org/aop.events/1496 | 4 |
18 | https://identifiers.org/aop.events/68 | 2 |
19 | https://identifiers.org/aop.events/1493 | 8 |
20 | https://identifiers.org/aop.events/265 | 2 |
21 | https://identifiers.org/aop.events/1750 | 3 |
22 | https://identifiers.org/aop.events/1848 | 3 |
23 | https://identifiers.org/aop.events/149 | 6 |
24 | https://identifiers.org/aop.events/1579 | 3 |
25 | https://identifiers.org/aop.events/209 | 4 |
26 | https://identifiers.org/aop.events/1498 | 2 |
27 | https://identifiers.org/aop.events/1500 | 2 |
28 | https://identifiers.org/aop.events/1488 | 2 |
29 | https://identifiers.org/aop.events/52 | 2 |
30 | https://identifiers.org/aop.events/484 | 2 |
31 | https://identifiers.org/aop.events/388 | 2 |
32 | https://identifiers.org/aop.events/1945 | 4 |
33 | https://identifiers.org/aop.events/2012 | 2 |
34 | https://identifiers.org/aop.events/1818 | 2 |
35 | https://identifiers.org/aop.events/1585 | 1 |
36 | https://identifiers.org/aop.events/1087 | 3 |
37 | https://identifiers.org/aop.events/195 | 2 |
38 | https://identifiers.org/aop.events/1090 | 2 |
39 | https://identifiers.org/aop.events/1841 | 1 |
merged_dataframe2= pd.merge(mergeddataframeDEG, n_variable_dataframe_reset2, on='KEID')
Section 6.2. Calculation of variable B and variable b.
In this section, variable B and variable b are calculated.
Step 33. Variable B is calculated by taking the length of the dataframe which includes all genes in 1 DEG table.
B=len(Ag48H_DEG.index)
B
30
Step 34. Variable b is calculated by taking the length of the dataframe which includes all genes in 1 DEG table with the condition for significance.
Ag48H_DEG_filtered=Ag48H_DEG[Ag48H_DEG['adj. p-value'] < 0.05]
b=len(Ag48H_DEG_filtered)
b
30
Section 6.3. Calculation of enrichment score and hypergeometric p-value
In this section, the enrichment score and hypergeometric p-value will be calculated.
Step 35. The final dataframe will be created that contains the KEID and the four variables: variable N, variable n, variable B and variable b.
Final_dataframe_ES= merged_dataframe2.loc[:, ['KEID','N','n']]
Final_dataframe_ES['B']=pd.Series([20518 for x in range(len(Final_dataframe_ES.index))])
Final_dataframe_ES['b']=pd.Series([30 for x in range(len(Final_dataframe_ES.index))])
Final_Dataframe_ES=Final_dataframe_ES.drop_duplicates(subset=['KEID'],keep='first')
Final_Dataframe_ES.reset_index(drop=True,inplace=True)
Copy_Final_DataFrame_ES=Final_Dataframe_ES.copy()
Step 36. The follow for loop will be used to calculate the enrichment score for individual key events and the results will be saved as a separate column into the dataframe.
def calculate_Enrichment_Score(row):
return f"{(row['n']/row['N'])/(row['b']/row['B'])}"
Copy_Final_DataFrame_ES.loc[:,'Enrichmentscore']= Copy_Final_DataFrame_ES.apply(calculate_Enrichment_Score,axis=1)
Copy_Final_DataFrame_ES
KEID | N | n | B | b | Enrichmentscore | |
---|---|---|---|---|---|---|
0 | https://identifiers.org/aop.events/1495 | 253 | 2 | 20518 | 30 | 5.406587615283267 |
1 | https://identifiers.org/aop.events/1668 | 156 | 2 | 20518 | 30 | 8.768376068376067 |
2 | https://identifiers.org/aop.events/244 | 417 | 2 | 20518 | 30 | 3.2802557953637086 |
3 | https://identifiers.org/aop.events/618 | 240 | 2 | 20518 | 30 | 5.699444444444444 |
4 | https://identifiers.org/aop.events/1497 | 528 | 3 | 20518 | 30 | 3.8859848484848483 |
5 | https://identifiers.org/aop.events/1633 | 1056 | 6 | 20518 | 30 | 3.8859848484848483 |
6 | https://identifiers.org/aop.events/265 | 268 | 2 | 20518 | 30 | 5.103980099502487 |
7 | https://identifiers.org/aop.events/1750 | 528 | 3 | 20518 | 30 | 3.8859848484848483 |
8 | https://identifiers.org/aop.events/1848 | 195 | 3 | 20518 | 30 | 10.522051282051281 |
9 | https://identifiers.org/aop.events/149 | 1056 | 6 | 20518 | 30 | 3.8859848484848483 |
10 | https://identifiers.org/aop.events/1579 | 353 | 3 | 20518 | 30 | 5.812464589235128 |
11 | https://identifiers.org/aop.events/209 | 617 | 4 | 20518 | 30 | 4.433927606699081 |
12 | https://identifiers.org/aop.events/1945 | 1218 | 4 | 20518 | 30 | 2.246086480569239 |
13 | https://identifiers.org/aop.events/1087 | 528 | 3 | 20518 | 30 | 3.8859848484848483 |
14 | https://identifiers.org/aop.events/1090 | 459 | 2 | 20518 | 30 | 2.9801016702977488 |
Step 37. The following for loop will be used to calculate the hypergeometric p-value for individual Key Events and save the result as a separate column into the dataframe. This requires some in between steps for manipulation of the dataframe.
p_value_dataframe2=[]
for index, row in Copy_Final_DataFrame_ES.iterrows():
M = row['B']
n = row['b']
N = row['N']
k = row['n']
hpd = ss.hypergeom(M, n, N)
p = hpd.pmf(k)
p_value_dataframe2.append(p)
Hypergeometricpvalue_dataframe2=pd.DataFrame(p_value_dataframe2)
Hypergeometricpvalue_dataframe2.columns= ['Hypergeometric p-value']
Hypergeometricpvalue_dataframe2
Hypergeometric p-value | |
---|---|
0 | 0.046663 |
1 | 0.020231 |
2 | 0.101112 |
3 | 0.042743 |
4 | 0.034153 |
5 | 0.003083 |
6 | 0.051296 |
7 | 0.034153 |
8 | 0.002662 |
9 | 0.003083 |
10 | 0.012880 |
11 | 0.010082 |
12 | 0.069281 |
13 | 0.034153 |
14 | 0.115558 |
merged_finaltable=pd.concat([Copy_Final_DataFrame_ES,Hypergeometricpvalue_dataframe2],axis=1)
Section 6.4. Filtering the results for significant KEs
In this section, the results will be filtered to only include significant KEs. Significant KEs have an enrichment score above 1 and a hypergeometric p-value below 0.05.
Step 38. Lastly, we filter the results to showcase the significant KEs for comparison 2.
filteredversion_Ag48H= merged_finaltable[(merged_finaltable['Enrichmentscore']>str(1))& (merged_finaltable['Hypergeometric p-value'] < 0.05)]
filteredversion_Ag48H
KEID | N | n | B | b | Enrichmentscore | Hypergeometric p-value | |
---|---|---|---|---|---|---|---|
0 | https://identifiers.org/aop.events/1495 | 253 | 2 | 20518 | 30 | 5.406587615283267 | 0.046663 |
1 | https://identifiers.org/aop.events/1668 | 156 | 2 | 20518 | 30 | 8.768376068376067 | 0.020231 |
3 | https://identifiers.org/aop.events/618 | 240 | 2 | 20518 | 30 | 5.699444444444444 | 0.042743 |
4 | https://identifiers.org/aop.events/1497 | 528 | 3 | 20518 | 30 | 3.8859848484848483 | 0.034153 |
5 | https://identifiers.org/aop.events/1633 | 1056 | 6 | 20518 | 30 | 3.8859848484848483 | 0.003083 |
7 | https://identifiers.org/aop.events/1750 | 528 | 3 | 20518 | 30 | 3.8859848484848483 | 0.034153 |
8 | https://identifiers.org/aop.events/1848 | 195 | 3 | 20518 | 30 | 10.522051282051281 | 0.002662 |
9 | https://identifiers.org/aop.events/149 | 1056 | 6 | 20518 | 30 | 3.8859848484848483 | 0.003083 |
10 | https://identifiers.org/aop.events/1579 | 353 | 3 | 20518 | 30 | 5.812464589235128 | 0.012880 |
11 | https://identifiers.org/aop.events/209 | 617 | 4 | 20518 | 30 | 4.433927606699081 | 0.010082 |
13 | https://identifiers.org/aop.events/1087 | 528 | 3 | 20518 | 30 | 3.8859848484848483 | 0.034153 |
# Ensure numeric types
filteredversion_Ag48H['Hypergeometric p-value'] = pd.to_numeric(filteredversion_Ag48H['Hypergeometric p-value'], errors='coerce')
filteredversion_Ag48H['Enrichmentscore'] = pd.to_numeric(filteredversion_Ag48H['Enrichmentscore'], errors='coerce')
filteredversion_Ag48H['combined_score'] = -np.log10(filteredversion_Ag48H['Hypergeometric p-value']) * filteredversion_Ag48H['Enrichmentscore']
# Sort by combined score (highest first)
C2_sorted = filteredversion_Ag48H.sort_values(by='combined_score', ascending=False)
# Show top rows
C2_sorted.to_excel('ConsistentKE-Ag-48h.xlsx')
C:\Users\shaki\AppData\Local\Temp\ipykernel_16388\1386545660.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
filteredversion_Ag48H['Hypergeometric p-value'] = pd.to_numeric(filteredversion_Ag48H['Hypergeometric p-value'], errors='coerce')
C:\Users\shaki\AppData\Local\Temp\ipykernel_16388\1386545660.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
filteredversion_Ag48H['Enrichmentscore'] = pd.to_numeric(filteredversion_Ag48H['Enrichmentscore'], errors='coerce')
C:\Users\shaki\AppData\Local\Temp\ipykernel_16388\1386545660.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
filteredversion_Ag48H['combined_score'] = -np.log10(filteredversion_Ag48H['Hypergeometric p-value']) * filteredversion_Ag48H['Enrichmentscore']
Section 6.5. Calculation of percent gene overlap to ORA
Section 6.5.1 Creation of the significant KEs table
In this section, you merge the dataframes to retrieve the genes connected to only the significant KEs.
Step 39. The significant KE table is created using the significan KEs from the previous merggeddataframe_final.
mergeddataframe_final2=mergeddataframe_final.copy()
mergeddataframe_final2['KEID'] = mergeddataframe_final2['KEID'].str.strip()
significantKEID_genetable2=mergeddataframe_final2[(mergeddataframe_final2['KEID'] == 'https://identifiers.org/aop.events/1668')| (mergeddataframe_final2['KEID'] =='https://identifiers.org/aop.events/1495') |(mergeddataframe_final2['KEID'] =='https://identifiers.org/aop.events/618')| (mergeddataframe_final2['KEID'] =='https://identifiers.org/aop.events/1497')|(mergeddataframe_final2['KEID'] =='https://identifiers.org/aop.events/1633') |(mergeddataframe_final2['KEID']=='https://identifiers.org/aop.events/1750')|(mergeddataframe_final2['KEID']=='https://identifiers.org/aop.events/1848')| (mergeddataframe_final2['KEID']=='https://identifiers.org/aop.events/149')| (mergeddataframe_final2['KEID']=='https://identifiers.org/aop.events/1579')| (mergeddataframe_final2['KEID']=='https://identifiers.org/aop.events/209')| (mergeddataframe_final2['KEID']=='https://identifiers.org/aop.events/1087')]
significantKEIDgenetable2=significantKEID_genetable2.drop(columns={'WPtitle','ID'})
Section 6.5.2 Significant ORA pathway table plus splitting
In this section, the significant ORA pathway table is created.
Step 40. The significant ORA pathway table is created using the significant enriched patwhays identified from the ORA analysis. This requires data manipulation to restructure the table in a way that the individual genes for the enriched pathways are placed on individual rows.
datafile_ORA2 = pd.read_csv("C:/Users/shaki/Downloads/ORA_tables_for_comparison/Comparison 2-Ag 48H.txt", sep='\t')
datafileORA2=pd.DataFrame(datafile_ORA2)
filtereddatafileORA_2=datafileORA2[datafileORA2['Adjusted P-value'] < 0.05]
filtereddatafileORA_2
Gene_set | Term | P-value | Adjusted P-value | Old P-value | Old adjusted P-value | Odds Ratio | Combined Score | Genes | |
---|---|---|---|---|---|---|---|---|---|
0 | WikiPathways_2024_Human | Platelet Mediated Interactions W Vascular And ... | 0.000259 | 0.013650 | 0 | 0 | 101.071600 | 834.69240 | CCL2;TLR4 |
1 | WikiPathways_2024_Human | P53 Transcriptional Gene Network WP4963 | 0.000266 | 0.013650 | 0 | 0 | 27.364940 | 225.27640 | CCL2;ULBP1;SERPINB5 |
2 | WikiPathways_2024_Human | Network Map Of SARS CoV 2 Signaling WP5115 | 0.000428 | 0.013650 | 0 | 0 | 12.948480 | 100.44810 | IFITM3;CCL2;PTGS2;CXCL5 |
3 | WikiPathways_2024_Human | LDL Influence On CD14 And TLR4 WP5272 | 0.000479 | 0.013650 | 0 | 0 | 72.172840 | 551.61330 | CCL2;TLR4 |
4 | WikiPathways_2024_Human | Spinal Cord Injury WP2431 | 0.000564 | 0.013650 | 0 | 0 | 20.985580 | 156.97910 | CCL2;PTGS2;TLR4 |
5 | WikiPathways_2024_Human | Interactions Immune Cells And miRNAs In Tumor ... | 0.000713 | 0.014382 | 0 | 0 | 58.279200 | 422.28080 | CCL2;TLR4 |
6 | WikiPathways_2024_Human | Immune Infiltration In Pancreatic Cancer WP5285 | 0.001385 | 0.023934 | 0 | 0 | 40.930930 | 269.42170 | CCL2;CXCL5 |
7 | WikiPathways_2024_Human | Fibrin Complement Receptor 3 Signaling WP4136 | 0.001605 | 0.024270 | 0 | 0 | 37.855560 | 243.59600 | CCL2;TLR4 |
8 | WikiPathways_2024_Human | SARS CoV 2 Innate Immunity Evasion And Cell Im... | 0.003914 | 0.048148 | 0 | 0 | 23.631940 | 130.99350 | CCL2;CXCL5 |
9 | WikiPathways_2024_Human | Glucocorticoid Receptor Pathway WP2880 | 0.004270 | 0.048148 | 0 | 0 | 22.570480 | 123.14720 | CCL2;PTGS2 |
10 | WikiPathways_2024_Human | Non Genomic Actions Of 1 25 Dihydroxyvitamin D... | 0.004767 | 0.048148 | 0 | 0 | 21.294730 | 113.84380 | CCL2;TLR4 |
11 | WikiPathways_2024_Human | Burn Wound Healing WP5055 | 0.004895 | 0.048148 | 0 | 0 | 20.997940 | 111.70010 | CCL2;TLR4 |
12 | WikiPathways_2024_Human | Cytokine Cytokine Receptor Interaction WP5473 | 0.005173 | 0.048148 | 0 | 0 | 9.452663 | 49.76173 | TNFSF15;CCL2;CXCL5 |
# Make sure 'Combined Score' is numeric
datafileORA2['Combined Score'] = pd.to_numeric(datafileORA2['Combined Score'], errors='coerce')
# Sort by 'Combined Score' in descending order
ranked_df2 = datafileORA2.sort_values(by='Combined Score', ascending=False)
# (Optional) Save to Excel
ranked_df2.to_excel('Ag48H-ORAtable-thesis(EMEXP3583).xlsx', index=False)
dropped_datafileORA_df2=filtereddatafileORA_2.drop(['Adjusted P-value','Odds Ratio','Old P-value','Gene_set','P-value','Old adjusted P-value','Combined Score'],axis=1)
droppeddatafileORAdf2=dropped_datafileORA_df2.copy()
droppeddatafileORAdf2['Genes']= droppeddatafileORAdf2['Genes'].replace({';':','},regex=True)
df2_ORApathwaytable=droppeddatafileORAdf2.copy()
df2_ORApathwaytable['Genes'] = df2_ORApathwaytable['Genes'].astype(str)
df2_ORApathwaytable['Genes'] = df2_ORApathwaytable['Genes'].str.split(',')
exploded_df2_ORApathwaytable = df2_ORApathwaytable.explode('Genes', ignore_index=True)
Section 6.5.3 For loop to get overlapping genes
In this section, the number of overlapping genes between the significant enrichment score-based Key Events and enriched pathways from ORA are calculated.
Step 41. Next, two sets are created by converting the significant KE table and ORA pathway table into dictionaries where the values of the genes are grouped together per key. This is followed by running a for loop to calculate the number of overlapping genes along with the symbols.
ORA_gene_sets2 = exploded_df2_ORApathwaytable.groupby('Term')['Genes'].apply(set).to_dict()
SignificantKE_gene_sets2 = significantKEIDgenetable2.groupby('KEID')['gene'].apply(set).to_dict()
overlapping_genes_betweenORA_and_significantKEs2 = {}
for term, ORA_genes in ORA_gene_sets2.items():
for KEID, KEID_genes in SignificantKE_gene_sets2.items():
overlap = ORA_genes.intersection(KEID_genes)
print(f"{term} x {KEID}: {len(overlap)} overlaps")
overlapping_genes_betweenORA_and_significantKEs2[(term, KEID)] = {
'overlapping genes': overlap,
'number of genes that overlap': len(overlap)
}
if overlapping_genes_betweenORA_and_significantKEs2:
print("\ntitle of Overlapping Gene(s) and the number between enriched pathways from ORA and significant KEs:")
for (term, KEID), result in overlapping_genes_betweenORA_and_significantKEs2.items():
print(f"Term: {term}, KEID: {KEID}, Title of overlapping gene(s): {result['overlapping genes']}, number: {result['number of genes that overlap']}")
else:
print("No overlapping genes")
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/1087: 2 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/149: 2 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/1495: 1 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/1497: 2 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/1579: 1 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/1633: 2 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/1668: 1 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/1750: 2 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/1848: 1 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/209: 1 overlaps
Burn Wound Healing WP5055 x https://identifiers.org/aop.events/618: 0 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/1087: 1 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/149: 1 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/1495: 0 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/1497: 1 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/1579: 2 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/1633: 1 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/1668: 0 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/1750: 1 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/1848: 0 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/209: 1 overlaps
Cytokine Cytokine Receptor Interaction WP5473 x https://identifiers.org/aop.events/618: 0 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/1087: 2 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/149: 2 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/1495: 1 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/1497: 2 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/1579: 1 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/1633: 2 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/1668: 1 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/1750: 2 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/1848: 1 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/209: 1 overlaps
Fibrin Complement Receptor 3 Signaling WP4136 x https://identifiers.org/aop.events/618: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1087: 1 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/149: 1 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1495: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1497: 1 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1579: 1 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1633: 1 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1668: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1750: 1 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/1848: 0 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/209: 1 overlaps
Glucocorticoid Receptor Pathway WP2880 x https://identifiers.org/aop.events/618: 0 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/1087: 1 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/149: 1 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/1495: 0 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/1497: 1 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/1579: 2 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/1633: 1 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/1668: 0 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/1750: 1 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/1848: 0 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/209: 1 overlaps
Immune Infiltration In Pancreatic Cancer WP5285 x https://identifiers.org/aop.events/618: 0 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/1087: 2 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/149: 2 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/1495: 1 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/1497: 2 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/1579: 1 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/1633: 2 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/1668: 1 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/1750: 2 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/1848: 1 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/209: 1 overlaps
Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559 x https://identifiers.org/aop.events/618: 0 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/1087: 2 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/149: 2 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/1495: 1 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/1497: 2 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/1579: 1 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/1633: 2 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/1668: 1 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/1750: 2 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/1848: 1 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/209: 1 overlaps
LDL Influence On CD14 And TLR4 WP5272 x https://identifiers.org/aop.events/618: 0 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/1087: 1 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/149: 1 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/1495: 0 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/1497: 1 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/1579: 2 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/1633: 1 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/1668: 0 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/1750: 1 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/1848: 0 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/209: 1 overlaps
Network Map Of SARS CoV 2 Signaling WP5115 x https://identifiers.org/aop.events/618: 0 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/1087: 2 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/149: 2 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/1495: 1 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/1497: 2 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/1579: 1 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/1633: 2 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/1668: 1 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/1750: 2 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/1848: 1 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/209: 1 overlaps
Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341 x https://identifiers.org/aop.events/618: 0 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/1087: 1 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/149: 1 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/1495: 0 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/1497: 1 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/1579: 1 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/1633: 1 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/1668: 0 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/1750: 1 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/1848: 0 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/209: 3 overlaps
P53 Transcriptional Gene Network WP4963 x https://identifiers.org/aop.events/618: 0 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/1087: 2 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/149: 2 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/1495: 1 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/1497: 2 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/1579: 1 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/1633: 2 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/1668: 1 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/1750: 2 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/1848: 1 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/209: 1 overlaps
Platelet Mediated Interactions W Vascular And Circulating Cells WP4462 x https://identifiers.org/aop.events/618: 0 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/1087: 1 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/149: 1 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/1495: 0 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/1497: 1 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/1579: 2 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/1633: 1 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/1668: 0 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/1750: 1 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/1848: 0 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/209: 1 overlaps
SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039 x https://identifiers.org/aop.events/618: 0 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/1087: 2 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/149: 2 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/1495: 1 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/1497: 2 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/1579: 1 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/1633: 2 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/1668: 1 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/1750: 2 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/1848: 1 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/209: 1 overlaps
Spinal Cord Injury WP2431 x https://identifiers.org/aop.events/618: 0 overlaps
title of Overlapping Gene(s) and the number between enriched pathways from ORA and significant KEs:
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Burn Wound Healing WP5055, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): set(), number: 0
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CXCL5', 'CCL2'}, number: 2
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): set(), number: 0
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): set(), number: 0
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Cytokine Cytokine Receptor Interaction WP5473, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Fibrin Complement Receptor 3 Signaling WP4136, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): set(), number: 0
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Glucocorticoid Receptor Pathway WP2880, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): set(), number: 0
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CXCL5', 'CCL2'}, number: 2
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): set(), number: 0
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): set(), number: 0
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Immune Infiltration In Pancreatic Cancer WP5285, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: LDL Influence On CD14 And TLR4 WP5272, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): set(), number: 0
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CXCL5', 'CCL2'}, number: 2
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): set(), number: 0
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): set(), number: 0
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Network Map Of SARS CoV 2 Signaling WP5115, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): set(), number: 0
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): set(), number: 0
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): set(), number: 0
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2', 'SERPINB5', 'ULBP1'}, number: 3
Term: P53 Transcriptional Gene Network WP4963, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Platelet Mediated Interactions W Vascular And Circulating Cells WP4462, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): set(), number: 0
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CXCL5', 'CCL2'}, number: 2
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): set(), number: 0
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): set(), number: 0
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/1579, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/1668, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'TLR4', 'CCL2'}, number: 2
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/1848, Title of overlapping gene(s): {'TLR4'}, number: 1
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CCL2'}, number: 1
Term: Spinal Cord Injury WP2431, KEID: https://identifiers.org/aop.events/618, Title of overlapping gene(s): set(), number: 0
Section 6.5.4 Tabulation gene overlap
In this section, a table is created that contains the number of overlapping genes and number of total genes in preparation for section 6.5.5.
final_geneoverlaptable_AG_48H=pd.DataFrame.from_dict(overlapping_genes_betweenORA_and_significantKEs2,orient='index')
Section 6.5.5 Percent overlap calculation
In this section, the percent overlap for the genesets are calculated.
Step 42. Lastly, the percent overlap is calculated and add the result as a column to the dataframe. This is first done by running a for loop to calculate the total number of genes belonging to the enriched pathways of ORA.
variable_count2= {}
for index, row in exploded_df2_ORApathwaytable.iterrows():
unique_KE = row['Term']
gene_expression_value = row['Genes']
if unique_KE not in variable_count2:
variable_count2[unique_KE] = 1
else:
variable_count2[unique_KE] += 1
print("The total number of genes: ")
print(variable_count2)
The total number of genes:
{'Platelet Mediated Interactions W Vascular And Circulating Cells WP4462': 2, 'P53 Transcriptional Gene Network WP4963': 3, 'Network Map Of SARS CoV 2 Signaling WP5115': 4, 'LDL Influence On CD14 And TLR4 WP5272': 2, 'Spinal Cord Injury WP2431': 3, 'Interactions Immune Cells And miRNAs In Tumor Microenvironment WP4559': 2, 'Immune Infiltration In Pancreatic Cancer WP5285': 2, 'Fibrin Complement Receptor 3 Signaling WP4136': 2, 'SARS CoV 2 Innate Immunity Evasion And Cell Immune Response WP5039': 2, 'Glucocorticoid Receptor Pathway WP2880': 2, 'Non Genomic Actions Of 1 25 Dihydroxyvitamin D3 WP4341': 2, 'Burn Wound Healing WP5055': 2, 'Cytokine Cytokine Receptor Interaction WP5473': 3}
Step 43. The result is converted into a dataframe and added to the final dataframe. This is followed by some data manipulation prior to calculation of gene set overlap.
variable_count_df2=pd.DataFrame.from_dict(variable_count2,orient='index')
reset_variable_count_df2 = variable_count_df2.reset_index()
reset_variable_count_df2.columns = ['Term', 'Total number of genes']
Genesetoverlaptable_AG48H=final_geneoverlaptable_AG_48H.reset_index(level=[1])
Genesetoverlaptable_AG48H.reset_index(inplace=True)
Genesetoverlaptable_AG48H.columns= ['Term','KEID','overlapping genes','number of genes that overlap']
tabulation_Ag48h=pd.merge(reset_variable_count_df2,Genesetoverlaptable_AG48H, on='Term')
def calculate_Genesetoverlap_Score(row):
return f"{(row['number of genes that overlap']/row['Total number of genes'])*100}"
tabulation_Ag48h.loc[:,'Percent geneset overlap']= tabulation_Ag48h.apply(calculate_Genesetoverlap_Score, axis=1)
tabulation_Ag48h.to_excel('geneoverlap-calculation-Ag48h.xlsx')
Section 7. Comparison 3: AgNP 24H
Section 7.1 Calculation of n variable
In this section, variable n will be calculated for the comparison 3.
Step 44. The table containing the differential expressed genes for Bisphenol A 1uM to control is loaded with the filter for significance.
AgNP_24H_DEG= pd.read_csv('topTable_AgNP_12.1_24 - H2O.control_.0.0_24.tsv',sep='\t')
AgNP24H_DEG= AgNP_24H_DEG[AgNP_24H_DEG['adj. p-value'] < 0.05]
AgNP24h_DEG= AgNP24H_DEG.copy()
AgNP24h_DEG.rename(columns={AgNP24h_DEG.columns[0]: 'Entrez.Gene'}, inplace=True)
AgNP24h_DEG['Entrez.Gene'] = AgNP24h_DEG['Entrez.Gene'].astype(str)
Step 45. Here, the results of the DEG table are integrated into the mergeddataframe dataframe. This is followed by adjustment of the dataframe columns to remove non-relevant columns.
merged_dataframe_DEG_AgNP_24h= pd.merge(mergeddataframe,AgNP24h_DEG, on='Entrez.Gene')
Step 46. Lastly, the following for loop for the key events is run to retrieve the n variable. It is comparable to the for loop of N, but adds a condition to check for significance of genes by p adjusted value being smaller than 0.05.
variable_n_dictionary_count3= {}
for index, row in merged_dataframe_DEG_AgNP_24h.iterrows():
unique_KE = row['KEID']
gene_expression_value = row['adj. p-value']
if gene_expression_value < 0.05:
if unique_KE not in variable_n_dictionary_count3:
variable_n_dictionary_count3[unique_KE] = 1
else:
variable_n_dictionary_count3[unique_KE] += 1
print("The total number of significant genes: ")
Step 47. The output of the n variable dictionary is saved as a dataframe and integrated as a separate column into a dataframe.
n_variable_dataframe3=pd.DataFrame.from_dict(variable_n_dictionary_count3,orient='index')
n_variable_dataframe3_reset = n_variable_dataframe3.reset_index()
n_variable_dataframe3_reset.columns = ['KEID', 'n']
n_variable_dataframe3_reset
KEID | n | |
---|---|---|
0 | https://identifiers.org/aop.events/486 | 40 |
1 | https://identifiers.org/aop.events/875 | 51 |
2 | https://identifiers.org/aop.events/2007 | 33 |
3 | https://identifiers.org/aop.events/1495 | 49 |
4 | https://identifiers.org/aop.events/105 | 8 |
... | ... | ... |
96 | https://identifiers.org/aop.events/1820 | 22 |
97 | https://identifiers.org/aop.events/896 | 20 |
98 | https://identifiers.org/aop.events/1549 | 35 |
99 | https://identifiers.org/aop.events/357 | 9 |
100 | https://identifiers.org/aop.events/352 | 76 |
101 rows × 2 columns
merged_dataframe3= pd.merge(mergeddataframeDEG, n_variable_dataframe3_reset, on='KEID')
Section 7.2. Calculation of variable B and variable b.
In this section, variable B and variable b are calculated.
Step 48. Variable B is calculated by taking the length of the dataframe which includes all genes in 1 DEG table.
B=len(AgNP_24H_DEG.index)
B
20518
Step 49. Variable b is calculated by taking the length of the dataframe which includes all genes in 1 DEG table with the condition for significance.
AgNP_24H_DEG_filtered=AgNP_24H_DEG[AgNP_24H_DEG['adj. p-value'] < 0.05]
b=len(AgNP_24H_DEG_filtered)
b
6213
Section 7.3. Calculation of enrichment score and hypergeometric p-value
In this section, the enrichment score and hypergeometric p-value will be calculated. This requires the four variables of the enrichment score per KE for which the formula will be applied to and stored in an additional dataframe.
Step 50. The final dataframe will be created that contains the KEID and the four variables: variable N, variable n, variable B and variable b.
Final_dataframe_ES= merged_dataframe3.loc[:, ['KEID','N','n']]
Final_dataframe_ES['B']=pd.Series([20518 for x in range(len(Final_dataframe_ES.index))])
Final_dataframe_ES['b']=pd.Series([6213 for x in range(len(Final_dataframe_ES.index))])
Final_Dataframe_ES=Final_dataframe_ES.drop_duplicates(subset=['KEID'],keep='first')
Final_Dataframe_ES.reset_index(drop=True,inplace=True)
Copy_Final_DataFrame_ES=Final_Dataframe_ES.copy()
Step 51. The following for loop will be used to calculate the enrichment score for individual key events and the results will be saved as a separate column into the dataframe.
def calculate_Enrichment_Score(row):
return f"{(row['n']/row['N'])/(row['b']/row['B'])}"
Copy_Final_DataFrame_ES.loc[:,'Enrichmentscore']= Copy_Final_DataFrame_ES.apply(calculate_Enrichment_Score,axis=1)
Copy_Final_DataFrame_ES
KEID | N | n | B | b | Enrichmentscore | |
---|---|---|---|---|---|---|
0 | https://identifiers.org/aop.events/1495 | 253 | 49 | 20518 | 6213 | 0.6396011423198457 |
1 | https://identifiers.org/aop.events/1668 | 156 | 33 | 20518 | 6213 | 0.6985910435934579 |
2 | https://identifiers.org/aop.events/244 | 417 | 135 | 20518 | 6213 | 1.069132139966443 |
3 | https://identifiers.org/aop.events/41 | 275 | 103 | 20518 | 6213 | 1.236910290739359 |
4 | https://identifiers.org/aop.events/1539 | 170 | 48 | 20518 | 6213 | 0.932450933053086 |
5 | https://identifiers.org/aop.events/618 | 240 | 60 | 20518 | 6213 | 0.8256075969740866 |
6 | https://identifiers.org/aop.events/1497 | 528 | 153 | 20518 | 6213 | 0.9569542601290549 |
7 | https://identifiers.org/aop.events/1115 | 34 | 13 | 20518 | 6213 | 1.2626939718427206 |
8 | https://identifiers.org/aop.events/1917 | 166 | 65 | 20518 | 6213 | 1.2931203326100151 |
9 | https://identifiers.org/aop.events/1633 | 1056 | 306 | 20518 | 6213 | 0.9569542601290549 |
10 | https://identifiers.org/aop.events/1392 | 102 | 39 | 20518 | 6213 | 1.2626939718427206 |
11 | https://identifiers.org/aop.events/1582 | 51 | 16 | 20518 | 6213 | 1.0360565922812066 |
12 | https://identifiers.org/aop.events/1896 | 205 | 66 | 20518 | 6213 | 1.0632214907373603 |
13 | https://identifiers.org/aop.events/265 | 268 | 76 | 20518 | 6213 | 0.9365101100004564 |
14 | https://identifiers.org/aop.events/1750 | 528 | 153 | 20518 | 6213 | 0.9569542601290549 |
15 | https://identifiers.org/aop.events/1848 | 195 | 42 | 20518 | 6213 | 0.7112926989315208 |
16 | https://identifiers.org/aop.events/890 | 34 | 13 | 20518 | 6213 | 1.2626939718427206 |
17 | https://identifiers.org/aop.events/149 | 1056 | 306 | 20518 | 6213 | 0.9569542601290549 |
18 | https://identifiers.org/aop.events/1579 | 353 | 91 | 20518 | 6213 | 0.8513347458882933 |
19 | https://identifiers.org/aop.events/249 | 34 | 13 | 20518 | 6213 | 1.2626939718427206 |
20 | https://identifiers.org/aop.events/288 | 51 | 19 | 20518 | 6213 | 1.230317203333933 |
21 | https://identifiers.org/aop.events/209 | 617 | 216 | 20518 | 6213 | 1.1561182557303256 |
22 | https://identifiers.org/aop.events/1945 | 1218 | 332 | 20518 | 6213 | 0.9001698594265903 |
23 | https://identifiers.org/aop.events/1087 | 528 | 153 | 20518 | 6213 | 0.9569542601290549 |
24 | https://identifiers.org/aop.events/1538 | 34 | 13 | 20518 | 6213 | 1.2626939718427206 |
25 | https://identifiers.org/aop.events/341 | 10 | 5 | 20518 | 6213 | 1.6512151939481732 |
26 | https://identifiers.org/aop.events/1090 | 459 | 129 | 20518 | 6213 | 0.9281340305852477 |
27 | https://identifiers.org/aop.events/352 | 398 | 76 | 20518 | 6213 | 0.6306148479400058 |
Step 52. The following for loop will be used to calculate the hypergeometric p-value for individual Key Events and save the result as a separate column into the dataframe. This requires some in between steps for manipulation of the dataframe.
p_value_dataframe3=[]
for index, row in Copy_Final_DataFrame_ES.iterrows():
M = row['B']
n = row['b']
N = row['N']
k = row['n']
hpd = ss.hypergeom(M, n, N)
p = hpd.pmf(k)
p_value_dataframe3.append(p)
Hypergeometricpvalue_dataframe3=pd.DataFrame(p_value_dataframe3)
Hypergeometricpvalue_dataframe3.columns= ['Hypergeometric p-value']
merged_finaltable_AgNp_24h=pd.concat([Copy_Final_DataFrame_ES,Hypergeometricpvalue_dataframe3],axis=1)
Section 7.4. Filtering the results for significant KEs
In this section, the results will be filtered to only include significant KEs. Significant KEs have an enrichment score above 1 and a hypergeometric p-value below 0.05.
Step 53. Lastly, we filter the results to showcase the significant KEs for the comparison: Bisphenol A 1uM.
filteredversion_AgNP_24h= merged_finaltable_AgNp_24h[(merged_finaltable_AgNp_24h['Enrichmentscore']>str(1))& (merged_finaltable_AgNp_24h['Hypergeometric p-value'] < 0.05)]
filteredversion_AgNP_24h
KEID | N | n | B | b | Enrichmentscore | Hypergeometric p-value | |
---|---|---|---|---|---|---|---|
2 | https://identifiers.org/aop.events/244 | 417 | 135 | 20518 | 6213 | 1.069132139966443 | 0.027258 |
3 | https://identifiers.org/aop.events/41 | 275 | 103 | 20518 | 6213 | 1.236910290739359 | 0.001904 |
8 | https://identifiers.org/aop.events/1917 | 166 | 65 | 20518 | 6213 | 1.2931203326100151 | 0.003231 |
10 | https://identifiers.org/aop.events/1392 | 102 | 39 | 20518 | 6213 | 1.2626939718427206 | 0.018653 |
21 | https://identifiers.org/aop.events/209 | 617 | 216 | 20518 | 6213 | 1.1561182557303256 | 0.001287 |
# Ensure numeric types
filteredversion_AgNP_24h['Hypergeometric p-value'] = pd.to_numeric(filteredversion_AgNP_24h['Hypergeometric p-value'], errors='coerce')
filteredversion_AgNP_24h['Enrichmentscore'] = pd.to_numeric(filteredversion_AgNP_24h['Enrichmentscore'], errors='coerce')
filteredversion_AgNP_24h['combined_score'] = -np.log10(filteredversion_AgNP_24h['Hypergeometric p-value']) * filteredversion_AgNP_24h['Enrichmentscore']
# Sort by combined score (highest first)
C3_sorted = filteredversion_AgNP_24h.sort_values(by='combined_score', ascending=False)
# Show top rows
C3_sorted.to_excel('ConsistentKE-AgNP24h.xlsx')
C:\Users\shaki\AppData\Local\Temp\ipykernel_16388\1225674419.py:2: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
filteredversion_AgNP_24h['Hypergeometric p-value'] = pd.to_numeric(filteredversion_AgNP_24h['Hypergeometric p-value'], errors='coerce')
C:\Users\shaki\AppData\Local\Temp\ipykernel_16388\1225674419.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
filteredversion_AgNP_24h['Enrichmentscore'] = pd.to_numeric(filteredversion_AgNP_24h['Enrichmentscore'], errors='coerce')
C:\Users\shaki\AppData\Local\Temp\ipykernel_16388\1225674419.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
filteredversion_AgNP_24h['combined_score'] = -np.log10(filteredversion_AgNP_24h['Hypergeometric p-value']) * filteredversion_AgNP_24h['Enrichmentscore']
Section 7.5. Calculation of percent gene overlap to ORA
Section 7.5.1 Creation of the significant KEs table
In this section, you merge the dataframes to retrieve the genes connected to only the significant KEs.
Step 54. The significant KE table is created using the significan KEs from the previous merggeddataframe_final.
significantKEID_genetable3=mergeddataframe_final2[(mergeddataframe_final2['KEID'] == 'https://identifiers.org/aop.events/244')| (mergeddataframe_final2['KEID'] =='https://identifiers.org/aop.events/41') | (mergeddataframe_final2['KEID'] =='https://identifiers.org/aop.events/1917')|(mergeddataframe_final2['KEID'] =='https://identifiers.org/aop.events/1392')| (mergeddataframe_final2['KEID']=='https://identifiers.org/aop.events/209')]
significantKEIDgenetable3=significantKEID_genetable3.drop(columns={'WPtitle','ID'})
significantKEIDgenetable3
KEID | gene | Entrez.Gene | |
---|---|---|---|
1221 | https://identifiers.org/aop.events/244 | CASP2 | 835 |
1222 | https://identifiers.org/aop.events/244 | RTCB | 51493 |
1223 | https://identifiers.org/aop.events/244 | BCL2 | 596 |
1224 | https://identifiers.org/aop.events/244 | BCL2 | 100049703 |
1225 | https://identifiers.org/aop.events/244 | BCL2L11 | 10018 |
... | ... | ... | ... |
13650 | https://identifiers.org/aop.events/209 | FANCD2 | 2177 |
13651 | https://identifiers.org/aop.events/209 | RPA1 | 6117 |
13652 | https://identifiers.org/aop.events/209 | PCNA | 5111 |
13653 | https://identifiers.org/aop.events/209 | RFC3 | 5983 |
13654 | https://identifiers.org/aop.events/209 | FAAP24 | 91442 |
1577 rows × 3 columns
Section 7.5.2 Significant ORA pathway table plus splitting
In this section, the significant ORA pathway table is created.
Step 55. The significant ORA pathway table is created using the significant enriched patwhays identified from the ORA analysis. This requires data manipulation to restructure the table in a way that the individual genes for the enriched pathways are placed on individual rows.
datafile_ORA3 = pd.read_csv("C:/Users/shaki/Downloads/ORA_tables_for_comparison/Comparison 3-AgNP-24H.txt", sep='\t')
datafileORA3=pd.DataFrame(datafile_ORA3)
filtereddatafileORA_3=datafileORA3[datafileORA3['Adjusted P-value'] < 0.05]
filtereddatafileORA_3
Gene_set | Term | P-value | Adjusted P-value | Old P-value | Old adjusted P-value | Odds Ratio | Combined Score | Genes | |
---|---|---|---|---|---|---|---|---|---|
0 | WikiPathways_2024_Human | Ciliopathies WP4803 | 0.000002 | 0.001669 | 0 | 0 | 2.050893 | 26.87625 | INVS;GALNT11;DYNC2I1;ODAD4;TRAF3IP1;IFT172;CEP... |
1 | WikiPathways_2024_Human | Genes Related To Primary Cilium Development Ba... | 0.000006 | 0.002491 | 0 | 0 | 2.477324 | 29.75603 | DYNC2I1;TTC23;TRAF3IP1;IFT172;CEP19;CEP120;CBY... |
2 | WikiPathways_2024_Human | Pluripotent Stem Cell Differentiation Pathway ... | 0.000100 | 0.020127 | 0 | 0 | 3.121977 | 28.75355 | ALK;CSF1R;EPO;PDGFA;FGF1;FGF4;INS;NT5E;FGF8;CX... |
3 | WikiPathways_2024_Human | Bardet Biedl Syndrome WP5234 | 0.000119 | 0.020127 | 0 | 0 | 2.299628 | 20.77129 | INVS;DYNC2I1;CEP104;TRAF3IP1;IFT172;PKD1L1;PKD... |
4 | WikiPathways_2024_Human | NRF2 Pathway WP2884 | 0.000143 | 0.020127 | 0 | 0 | 1.918783 | 16.99291 | SERPINA1;HSP90AB1;SRXN1;SLC2A1;KEAP1;SLC2A2;SL... |
5 | WikiPathways_2024_Human | Nuclear Receptors Meta Pathway WP2882 | 0.000147 | 0.020127 | 0 | 0 | 1.556285 | 13.73149 | KEAP1;IRS2;AHR;RGS2;SCP2;FTH1;PDK4;CYP1B1;ACAA... |
6 | WikiPathways_2024_Human | Photodynamic Therapy Induced NFE2L2 NRF2 Survi... | 0.000268 | 0.029633 | 0 | 0 | 4.723248 | 38.84128 | ABCC3;ABCC4;JUN;ABCC2;SRXN1;EPHX1;ABCC6;KEAP1;... |
7 | WikiPathways_2024_Human | Proximal Tubule Transport WP4917 | 0.000289 | 0.029633 | 0 | 0 | 2.611755 | 21.28253 | ATP6V1A;SLC47A1;SLC1A1;SLC2A1;SLC2A2;SLC5A1;SL... |
8 | WikiPathways_2024_Human | Osteoblast Differentiation And Related Disease... | 0.000326 | 0.029714 | 0 | 0 | 1.950916 | 15.66235 | IHH;FZD10;FGF1;FGF3;GLI3;FGF4;PIK3C2B;GLI2;FGF... |
9 | WikiPathways_2024_Human | Vitamin D Receptor Pathway WP2877 | 0.000375 | 0.030776 | 0 | 0 | 1.708108 | 13.47314 | ITGAM;IL25;HILPDA;TNFAIP3;GXYLT2;SLC2A4;TREM1;... |
10 | WikiPathways_2024_Human | G1 To S Cell Cycle Control WP45 | 0.000550 | 0.041030 | 0 | 0 | 2.368871 | 17.77806 | CDKN1C;PCNA;MCM7;ATF6B;PRIM1;CCND3;CCNB1;CCND2... |
# Make sure 'Combined Score' is numeric
datafileORA3['Combined Score'] = pd.to_numeric(datafileORA3['Combined Score'], errors='coerce')
# Sort by 'Combined Score' in descending order
ranked_df3 = datafileORA3.sort_values(by='Combined Score', ascending=False)
# (Optional) Save to Excel
ranked_df3.to_excel('AgNP24H-ORAtable-thesis(EMEXP3583).xlsx', index=False)
dropped_datafileORA_df3=filtereddatafileORA_3.drop(['Adjusted P-value','Odds Ratio','Old P-value','Gene_set','P-value','Old adjusted P-value','Combined Score'],axis=1)
droppeddatafileORAdf3=dropped_datafileORA_df3.copy()
droppeddatafileORAdf3['Genes']= droppeddatafileORAdf3['Genes'].replace({';':','},regex=True)
df3_ORApathwaytable=droppeddatafileORAdf3.copy()
df3_ORApathwaytable['Genes'] = df3_ORApathwaytable['Genes'].astype(str)
df3_ORApathwaytable['Genes'] = df3_ORApathwaytable['Genes'].str.split(',')
exploded_df3_ORApathwaytable = df3_ORApathwaytable.explode('Genes', ignore_index=True)
Section 7.5.3 For loop to get overlapping genes
In this section, the number of overlapping genes between the significant enrichment score-based Key Events and enriched pathways from ORA are calculated.
Step 56. Next, two sets are created by converting the significant KE table and ORA pathway table into dictionaries where the values of the genes are grouped together per key. This is followed by running a for loop to calculate the number of overlapping genes along with the symbols.
ORA_gene_sets3 = exploded_df3_ORApathwaytable.groupby('Term')['Genes'].apply(set).to_dict()
SignificantKE_gene_sets3 = significantKEIDgenetable3.groupby('KEID')['gene'].apply(set).to_dict()
overlapping_genes_betweenORA_and_significantKEs3 = {}
for term, ORA_genes in ORA_gene_sets3.items():
for KEID, KEID_genes in SignificantKE_gene_sets3.items():
overlap = ORA_genes.intersection(KEID_genes)
print(f"{term} x {KEID}: {len(overlap)} overlaps")
overlapping_genes_betweenORA_and_significantKEs3[(term, KEID)] = {
'overlapping genes': overlap,
'number of genes that overlap': len(overlap)
}
if overlapping_genes_betweenORA_and_significantKEs3:
print("\ntitle of Overlapping Gene(s) and the number between enriched pathways from ORA and significant KEs:")
for (term, KEID), result in overlapping_genes_betweenORA_and_significantKEs3.items():
print(f"Term: {term}, KEID: {KEID}, Title of overlapping gene(s): {result['overlapping genes']}, number: {result['number of genes that overlap']}")
else:
print("No overlapping genes")
Bardet Biedl Syndrome WP5234 x https://identifiers.org/aop.events/1392: 0 overlaps
Bardet Biedl Syndrome WP5234 x https://identifiers.org/aop.events/1917: 0 overlaps
Bardet Biedl Syndrome WP5234 x https://identifiers.org/aop.events/209: 1 overlaps
Bardet Biedl Syndrome WP5234 x https://identifiers.org/aop.events/244: 0 overlaps
Bardet Biedl Syndrome WP5234 x https://identifiers.org/aop.events/41: 0 overlaps
Ciliopathies WP4803 x https://identifiers.org/aop.events/1392: 0 overlaps
Ciliopathies WP4803 x https://identifiers.org/aop.events/1917: 0 overlaps
Ciliopathies WP4803 x https://identifiers.org/aop.events/209: 1 overlaps
Ciliopathies WP4803 x https://identifiers.org/aop.events/244: 0 overlaps
Ciliopathies WP4803 x https://identifiers.org/aop.events/41: 0 overlaps
G1 To S Cell Cycle Control WP45 x https://identifiers.org/aop.events/1392: 0 overlaps
G1 To S Cell Cycle Control WP45 x https://identifiers.org/aop.events/1917: 0 overlaps
G1 To S Cell Cycle Control WP45 x https://identifiers.org/aop.events/209: 6 overlaps
G1 To S Cell Cycle Control WP45 x https://identifiers.org/aop.events/244: 10 overlaps
G1 To S Cell Cycle Control WP45 x https://identifiers.org/aop.events/41: 3 overlaps
Genes Related To Primary Cilium Development Based On CRISPR WP4536 x https://identifiers.org/aop.events/1392: 0 overlaps
Genes Related To Primary Cilium Development Based On CRISPR WP4536 x https://identifiers.org/aop.events/1917: 0 overlaps
Genes Related To Primary Cilium Development Based On CRISPR WP4536 x https://identifiers.org/aop.events/209: 0 overlaps
Genes Related To Primary Cilium Development Based On CRISPR WP4536 x https://identifiers.org/aop.events/244: 0 overlaps
Genes Related To Primary Cilium Development Based On CRISPR WP4536 x https://identifiers.org/aop.events/41: 0 overlaps
NRF2 Pathway WP2884 x https://identifiers.org/aop.events/1392: 5 overlaps
NRF2 Pathway WP2884 x https://identifiers.org/aop.events/1917: 60 overlaps
NRF2 Pathway WP2884 x https://identifiers.org/aop.events/209: 60 overlaps
NRF2 Pathway WP2884 x https://identifiers.org/aop.events/244: 60 overlaps
NRF2 Pathway WP2884 x https://identifiers.org/aop.events/41: 60 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1392: 6 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1917: 61 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/209: 69 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/244: 66 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/41: 70 overlaps
Osteoblast Differentiation And Related Diseases WP4787 x https://identifiers.org/aop.events/1392: 1 overlaps
Osteoblast Differentiation And Related Diseases WP4787 x https://identifiers.org/aop.events/1917: 0 overlaps
Osteoblast Differentiation And Related Diseases WP4787 x https://identifiers.org/aop.events/209: 14 overlaps
Osteoblast Differentiation And Related Diseases WP4787 x https://identifiers.org/aop.events/244: 11 overlaps
Osteoblast Differentiation And Related Diseases WP4787 x https://identifiers.org/aop.events/41: 2 overlaps
Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612 x https://identifiers.org/aop.events/1392: 4 overlaps
Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612 x https://identifiers.org/aop.events/1917: 7 overlaps
Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612 x https://identifiers.org/aop.events/209: 10 overlaps
Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612 x https://identifiers.org/aop.events/244: 8 overlaps
Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612 x https://identifiers.org/aop.events/41: 8 overlaps
Pluripotent Stem Cell Differentiation Pathway WP2848 x https://identifiers.org/aop.events/1392: 0 overlaps
Pluripotent Stem Cell Differentiation Pathway WP2848 x https://identifiers.org/aop.events/1917: 1 overlaps
Pluripotent Stem Cell Differentiation Pathway WP2848 x https://identifiers.org/aop.events/209: 4 overlaps
Pluripotent Stem Cell Differentiation Pathway WP2848 x https://identifiers.org/aop.events/244: 4 overlaps
Pluripotent Stem Cell Differentiation Pathway WP2848 x https://identifiers.org/aop.events/41: 1 overlaps
Proximal Tubule Transport WP4917 x https://identifiers.org/aop.events/1392: 0 overlaps
Proximal Tubule Transport WP4917 x https://identifiers.org/aop.events/1917: 9 overlaps
Proximal Tubule Transport WP4917 x https://identifiers.org/aop.events/209: 9 overlaps
Proximal Tubule Transport WP4917 x https://identifiers.org/aop.events/244: 9 overlaps
Proximal Tubule Transport WP4917 x https://identifiers.org/aop.events/41: 9 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/1392: 2 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/1917: 4 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/209: 10 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/244: 5 overlaps
Vitamin D Receptor Pathway WP2877 x https://identifiers.org/aop.events/41: 4 overlaps
title of Overlapping Gene(s) and the number between enriched pathways from ORA and significant KEs:
Term: Bardet Biedl Syndrome WP5234, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Bardet Biedl Syndrome WP5234, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): set(), number: 0
Term: Bardet Biedl Syndrome WP5234, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'INVS'}, number: 1
Term: Bardet Biedl Syndrome WP5234, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): set(), number: 0
Term: Bardet Biedl Syndrome WP5234, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): set(), number: 0
Term: Ciliopathies WP4803, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Ciliopathies WP4803, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): set(), number: 0
Term: Ciliopathies WP4803, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'INVS'}, number: 1
Term: Ciliopathies WP4803, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): set(), number: 0
Term: Ciliopathies WP4803, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): set(), number: 0
Term: G1 To S Cell Cycle Control WP45, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: G1 To S Cell Cycle Control WP45, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): set(), number: 0
Term: G1 To S Cell Cycle Control WP45, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'GADD45A', 'POLE2', 'PCNA', 'CCND3', 'CCND2', 'CDC25A'}, number: 6
Term: G1 To S Cell Cycle Control WP45, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): {'GADD45A', 'CCNG2', 'TP53', 'CDC25A', 'CCNB1', 'CDK1', 'MDM2', 'CCND3', 'CCND2', 'CDK6'}, number: 10
Term: G1 To S Cell Cycle Control WP45, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'CCNA1', 'CCNB1', 'TP53'}, number: 3
Term: Genes Related To Primary Cilium Development Based On CRISPR WP4536, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Genes Related To Primary Cilium Development Based On CRISPR WP4536, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): set(), number: 0
Term: Genes Related To Primary Cilium Development Based On CRISPR WP4536, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): set(), number: 0
Term: Genes Related To Primary Cilium Development Based On CRISPR WP4536, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): set(), number: 0
Term: Genes Related To Primary Cilium Development Based On CRISPR WP4536, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): set(), number: 0
Term: NRF2 Pathway WP2884, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'SOD3', 'HMOX1', 'GSR', 'GPX3', 'GCLC'}, number: 5
Term: NRF2 Pathway WP2884, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'FTH1', 'GPX3', 'G6PD', 'PGD', 'GSTA5', 'SLC39A12', 'SLC6A20', 'ABCC4', 'SLC5A12', 'SLC5A2', 'TGFBR2', 'SLC6A6', 'SLC5A4', 'SERPINA1', 'HSP90AB1', 'SLC6A2', 'GPX2', 'DNAJB1', 'HMOX1', 'TGFB1', 'SLC6A8', 'ABCC2', 'SLC5A5', 'CBR1', 'ABCC3', 'MAFG', 'CES5A', 'SLC39A11', 'SOD3', 'SLC5A10', 'KEAP1', 'RXRA', 'UGT1A4', 'TXNRD3', 'SLC2A1', 'NRG1', 'EGR1', 'MAFF', 'SLC39A10', 'HSPA1A', 'SLC5A9', 'SLC6A9', 'ALDH3A1', 'SRXN1', 'SLC6A5', 'GSTA4', 'SLC6A19', 'SLC6A17', 'GSR', 'HSP90AA1', 'CES2', 'GCLC', 'SLC6A1', 'SLC2A4', 'SLC6A7', 'SLC2A14', 'HBEGF', 'UGT1A9'}, number: 60
Term: NRF2 Pathway WP2884, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'FTH1', 'GPX3', 'G6PD', 'PGD', 'GSTA5', 'SLC39A12', 'SLC6A20', 'ABCC4', 'SLC5A12', 'SLC5A2', 'TGFBR2', 'SLC6A6', 'SLC5A4', 'SERPINA1', 'HSP90AB1', 'SLC6A2', 'GPX2', 'DNAJB1', 'HMOX1', 'TGFB1', 'SLC6A8', 'ABCC2', 'SLC5A5', 'CBR1', 'ABCC3', 'MAFG', 'CES5A', 'SLC39A11', 'SOD3', 'SLC5A10', 'KEAP1', 'RXRA', 'UGT1A4', 'TXNRD3', 'SLC2A1', 'NRG1', 'EGR1', 'MAFF', 'SLC39A10', 'HSPA1A', 'SLC5A9', 'SLC6A9', 'ALDH3A1', 'SRXN1', 'SLC6A5', 'GSTA4', 'SLC6A19', 'SLC6A17', 'GSR', 'HSP90AA1', 'CES2', 'GCLC', 'SLC6A1', 'SLC2A4', 'SLC6A7', 'SLC2A14', 'HBEGF', 'UGT1A9'}, number: 60
Term: NRF2 Pathway WP2884, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'FTH1', 'GPX3', 'G6PD', 'PGD', 'GSTA5', 'SLC39A12', 'SLC6A20', 'ABCC4', 'SLC5A12', 'SLC5A2', 'TGFBR2', 'SLC6A6', 'SLC5A4', 'SERPINA1', 'HSP90AB1', 'SLC6A2', 'GPX2', 'DNAJB1', 'HMOX1', 'TGFB1', 'SLC6A8', 'ABCC2', 'SLC5A5', 'CBR1', 'ABCC3', 'MAFG', 'CES5A', 'SLC39A11', 'SOD3', 'SLC5A10', 'KEAP1', 'RXRA', 'UGT1A4', 'TXNRD3', 'SLC2A1', 'NRG1', 'EGR1', 'MAFF', 'SLC39A10', 'HSPA1A', 'SLC5A9', 'SLC6A9', 'ALDH3A1', 'SRXN1', 'SLC6A5', 'GSTA4', 'SLC6A19', 'SLC6A17', 'GSR', 'HSP90AA1', 'CES2', 'GCLC', 'SLC6A1', 'SLC2A4', 'SLC6A7', 'SLC2A14', 'HBEGF', 'UGT1A9'}, number: 60
Term: NRF2 Pathway WP2884, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'FTH1', 'GPX3', 'G6PD', 'PGD', 'GSTA5', 'SLC39A12', 'SLC6A20', 'ABCC4', 'SLC5A12', 'SLC5A2', 'TGFBR2', 'SLC6A6', 'SLC5A4', 'SERPINA1', 'HSP90AB1', 'SLC6A2', 'GPX2', 'DNAJB1', 'HMOX1', 'TGFB1', 'SLC6A8', 'ABCC2', 'SLC5A5', 'CBR1', 'ABCC3', 'MAFG', 'CES5A', 'SLC39A11', 'SOD3', 'SLC5A10', 'KEAP1', 'RXRA', 'UGT1A4', 'TXNRD3', 'SLC2A1', 'NRG1', 'EGR1', 'MAFF', 'SLC39A10', 'HSPA1A', 'SLC5A9', 'SLC6A9', 'ALDH3A1', 'SRXN1', 'SLC6A5', 'GSTA4', 'SLC6A19', 'SLC6A17', 'GSR', 'HSP90AA1', 'CES2', 'GCLC', 'SLC6A1', 'SLC2A4', 'SLC6A7', 'SLC2A14', 'HBEGF', 'UGT1A9'}, number: 60
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'SOD3', 'HMOX1', 'GSR', 'GPX3', 'CYP1A1', 'GCLC'}, number: 6
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'FTH1', 'G6PD', 'GPX3', 'GSTA5', 'PGD', 'SLC39A12', 'SLC6A20', 'ABCC4', 'SLC5A12', 'SLC5A2', 'TGFBR2', 'SRC', 'SLC6A6', 'SLC5A4', 'SERPINA1', 'GPX2', 'HSP90AB1', 'SLC6A2', 'DNAJB1', 'HMOX1', 'TGFB1', 'SLC6A8', 'ABCC2', 'SLC5A5', 'ABCC3', 'MAFG', 'CBR1', 'CES5A', 'SLC39A11', 'SOD3', 'SLC5A10', 'UGT1A4', 'KEAP1', 'TXNRD3', 'RXRA', 'SLC2A1', 'NRG1', 'EGR1', 'MAFF', 'SLC39A10', 'HSPA1A', 'SLC5A9', 'SLC6A9', 'ALDH3A1', 'SRXN1', 'SLC6A5', 'GSTA4', 'SLC6A19', 'SLC6A17', 'GSR', 'HSP90AA1', 'CES2', 'GCLC', 'SLC6A1', 'SLC2A4', 'SLC6A7', 'SLC2A14', 'HBEGF', 'UGT1A9'}, number: 61
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'CPT2', 'ACAA1', 'CPT1A', 'FTH1', 'G6PD', 'GPX3', 'EHHADH', 'GSTA5', 'PGD', 'SLC39A12', 'SLC6A20', 'APOA1', 'ABCC4', 'SLC5A12', 'SLC5A2', 'TGFBR2', 'SLC6A6', 'SLC5A4', 'SERPINA1', 'GPX2', 'HSP90AB1', 'SLC6A2', 'DNAJB1', 'CYP1A1', 'HMOX1', 'SCP2', 'TGFB1', 'SLC6A8', 'ABCC2', 'SLC5A5', 'ABCC3', 'JUN', 'MAFG', 'CBR1', 'CES5A', 'SLC39A11', 'SOD3', 'PCK1', 'SLC5A10', 'UGT1A4', 'KEAP1', 'TXNRD3', 'RXRA', 'SLC2A1', 'NRG1', 'EGR1', 'MAFF', 'SLC39A10', 'HSPA1A', 'SLC5A9', 'SLC6A9', 'ALDH3A1', 'SRXN1', 'SLC6A5', 'GSTA4', 'SLC6A19', 'SLC6A17', 'GSR', 'HSP90AA1', 'CES2', 'GCLC', 'SLC6A1', 'SLC2A4', 'SLC6A7', 'SLC2A14', 'HBEGF', 'UGT1A9'}, number: 69
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'GADD45B', 'FTH1', 'G6PD', 'GPX3', 'GSTA5', 'PGD', 'NFKB2', 'SLC39A12', 'SLC6A20', 'ABCC4', 'SLC5A12', 'SLC5A2', 'TGFBR2', 'SRC', 'SLC6A6', 'SLC5A4', 'SERPINA1', 'GPX2', 'HSP90AB1', 'SLC6A2', 'DNAJB1', 'HMOX1', 'CDK1', 'SCP2', 'TGFB1', 'SLC6A8', 'ABCC2', 'SLC5A5', 'ABCC3', 'JUN', 'MAFG', 'CBR1', 'CES5A', 'SLC39A11', 'SOD3', 'SLC5A10', 'UGT1A4', 'KEAP1', 'TXNRD3', 'RXRA', 'SLC2A1', 'NRG1', 'EGR1', 'MAFF', 'SLC39A10', 'HSPA1A', 'SLC5A9', 'SLC6A9', 'ALDH3A1', 'SRXN1', 'SLC6A5', 'GSTA4', 'SLC6A19', 'SLC6A17', 'GSR', 'HSP90AA1', 'CES2', 'GCLC', 'SLC6A1', 'SLC2A4', 'SLC6A7', 'SLC2A14', 'HBEGF', 'UGT1A9'}, number: 66
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'IRS2', 'SLC2A4', 'CPT1A', 'FTH1', 'NR0B2', 'G6PD', 'GPX3', 'ABCG8', 'GSTA5', 'PGD', 'SLC39A12', 'SLC6A20', 'ABCC4', 'SLC5A12', 'SLC5A2', 'TGFBR2', 'SRC', 'SLC6A6', 'SLC5A4', 'SERPINA1', 'GPX2', 'HSP90AB1', 'SLC6A2', 'DNAJB1', 'HMOX1', 'TGFB1', 'SLC6A8', 'ABCC2', 'IP6K3', 'SLC5A5', 'ABCC3', 'MAFG', 'CBR1', 'CES5A', 'SLC39A11', 'SOD3', 'SLC5A10', 'UGT1A4', 'KEAP1', 'TXNRD3', 'RXRA', 'SLC2A1', 'NRG1', 'EGR1', 'MAFF', 'SLC39A10', 'HSPA1A', 'SLC5A9', 'SLC6A9', 'ALDH3A1', 'SRXN1', 'SLC6A5', 'BAAT', 'GSTA4', 'SLC6A19', 'SLC6A17', 'GSR', 'HSP90AA1', 'CES2', 'FASN', 'SLC6A1', 'ABCG5', 'GCLC', 'FKBP5', 'SLC6A7', 'SLC2A14', 'HBEGF', 'UGT1A9'}, number: 70
Term: Osteoblast Differentiation And Related Diseases WP4787, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'MAPK14'}, number: 1
Term: Osteoblast Differentiation And Related Diseases WP4787, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): set(), number: 0
Term: Osteoblast Differentiation And Related Diseases WP4787, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'WNT3A', 'MAPK14', 'FZD7', 'WNT1', 'WNT11', 'WNT7B', 'WNT7A', 'WNT6', 'LRP5', 'FZD10', 'FZD2', 'FZD9', 'FZD3', 'WNT10A'}, number: 14
Term: Osteoblast Differentiation And Related Diseases WP4787, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): {'WNT3A', 'PIK3R3', 'PIK3C2B', 'WNT1', 'WNT11', 'PIK3R1', 'WNT7B', 'WNT6', 'PIK3R5', 'WNT7A', 'WNT10A'}, number: 11
Term: Osteoblast Differentiation And Related Diseases WP4787, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'PIK3R3', 'PIK3R1'}, number: 2
Term: Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'GCLC', 'FOS', 'MAPK14', 'HMOX1'}, number: 4
Term: Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): {'ABCC4', 'SRXN1', 'KEAP1', 'HMOX1', 'GCLC', 'ABCC2', 'ABCC3'}, number: 7
Term: Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'ABCC4', 'SRXN1', 'MAPK14', 'KEAP1', 'HMOX1', 'GCLC', 'FOS', 'ABCC2', 'JUN', 'ABCC3'}, number: 10
Term: Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): {'ABCC4', 'SRXN1', 'KEAP1', 'HMOX1', 'GCLC', 'JUN', 'ABCC2', 'ABCC3'}, number: 8
Term: Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'ABCC4', 'SRXN1', 'KEAP1', 'HMOX1', 'GCLC', 'ABCC2', 'ABCC3', 'EPHX1'}, number: 8
Term: Pluripotent Stem Cell Differentiation Pathway WP2848, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Pluripotent Stem Cell Differentiation Pathway WP2848, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): {'TGFB1'}, number: 1
Term: Pluripotent Stem Cell Differentiation Pathway WP2848, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'WNT3A', 'TGFB1', 'WNT7B', 'WNT1'}, number: 4
Term: Pluripotent Stem Cell Differentiation Pathway WP2848, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): {'WNT3A', 'TGFB1', 'WNT7B', 'WNT1'}, number: 4
Term: Pluripotent Stem Cell Differentiation Pathway WP2848, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'TGFB1'}, number: 1
Term: Proximal Tubule Transport WP4917, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Proximal Tubule Transport WP4917, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'ABCC4', 'SLC5A2', 'SLC2A1', 'SLC6A19', 'ABCC2', 'SLC5A5', 'SLC6A20'}, number: 9
Term: Proximal Tubule Transport WP4917, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'ABCC4', 'SLC5A2', 'SLC2A1', 'SLC6A19', 'ABCC2', 'SLC5A5', 'SLC6A20'}, number: 9
Term: Proximal Tubule Transport WP4917, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'ABCC4', 'SLC5A2', 'SLC2A1', 'SLC6A19', 'ABCC2', 'SLC5A5', 'SLC6A20'}, number: 9
Term: Proximal Tubule Transport WP4917, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'SLC2A2', 'SLC5A1', 'ABCC4', 'SLC5A2', 'SLC2A1', 'SLC6A19', 'ABCC2', 'SLC5A5', 'SLC6A20'}, number: 9
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'CYP1A1', 'NOX1'}, number: 2
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): {'G6PD', 'RXRA', 'TGFB1', 'SLC2A4'}, number: 4
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'GADD45A', 'SFRP1', 'RXRA', 'TGFB1', 'NOX1', 'LRP5', 'G6PD', 'CYP1A1', 'SLC2A4', 'IRF5'}, number: 10
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/244, Title of overlapping gene(s): {'GADD45A', 'RXRA', 'TGFB1', 'G6PD', 'SLC2A4'}, number: 5
Term: Vitamin D Receptor Pathway WP2877, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'G6PD', 'RXRA', 'TGFB1', 'SLC2A4'}, number: 4
Section 7.5.4 Tabulation gene overlap
In this section, a table is created that contains the number of overlapping genes and number of total genes in preparation for section 7.5.5.
final_geneoverlaptable_AGNP_24H=pd.DataFrame.from_dict(overlapping_genes_betweenORA_and_significantKEs3,orient='index')
Section 7.5.5 Percent overlap calculation
In this section, the percent overlap for the genesets are calculated.
Step 57. Lastly, the percent overlap is calculated and add the result as a column to the dataframe. This is first done by running a for loop to calculate the total number of genes belonging to the enriched pathways of ORA.
variable_count3= {}
for index, row in exploded_df3_ORApathwaytable.iterrows():
unique_KE = row['Term']
gene_expression_value = row['Genes']
if unique_KE not in variable_count3:
variable_count3[unique_KE] = 1
else:
variable_count3[unique_KE] += 1
print("The total number of genes: ")
print(variable_count3)
The total number of genes:
{'Ciliopathies WP4803': 81, 'Genes Related To Primary Cilium Development Based On CRISPR WP4536': 50, 'Pluripotent Stem Cell Differentiation Pathway WP2848': 26, 'Bardet Biedl Syndrome WP5234': 41, 'NRF2 Pathway WP2884': 60, 'Nuclear Receptors Meta Pathway WP2882': 118, 'Photodynamic Therapy Induced NFE2L2 NRF2 Survival Signaling WP3612': 15, 'Proximal Tubule Transport WP4917': 29, 'Osteoblast Differentiation And Related Diseases WP4787': 51, 'Vitamin D Receptor Pathway WP2877': 73, 'G1 To S Cell Cycle Control WP45': 31}
Step 58. The result is converted into a dataframe and added to the final dataframe. This is followed by some data manipulation prior to calculation of gene set overlap.
variable_count_df3=pd.DataFrame.from_dict(variable_count3,orient='index')
reset_variable_count_df3 = variable_count_df3.reset_index()
reset_variable_count_df3.columns = ['Term', 'Total number of genes']
Genesetoverlaptable_AGNP24H=final_geneoverlaptable_AGNP_24H.reset_index(level=[1])
Genesetoverlaptable_AGNP24H.reset_index(inplace=True)
Genesetoverlaptable_AGNP24H.columns= ['Term','KEID','overlapping genes','number of genes that overlap']
tabulation_AGNP24h=pd.merge(reset_variable_count_df3,Genesetoverlaptable_AGNP24H, on='Term')
def calculate_Genesetoverlap_Score(row):
return f"{(row['number of genes that overlap']/row['Total number of genes'])*100}"
tabulation_AGNP24h.loc[:,'Percent geneset overlap']= tabulation_AGNP24h.apply(calculate_Genesetoverlap_Score, axis=1)
tabulation_AGNP24h.to_excel('geneoverlap-calculation-AgNP24h.xlsx')
Section 8. Comparison 4: AgNP 48H
Section 8.1 Calculation of n variable
In this section, variable n will be calculated for the comparison: Bisphenol A concentration 1uM to control.
Step 59. The table containing the differential expressed genes for comparison 4 is loaded with the filter for significance.
AgNP_48H_DEG= pd.read_csv('topTable_AgNP_12.1_48 - H2O.control_.0.0_48.tsv',sep='\t')
AgNP48H_DEG= AgNP_48H_DEG[AgNP_48H_DEG['adj. p-value'] < 0.05]
AgNP48h_DEG= AgNP48H_DEG.copy()
AgNP48h_DEG.rename(columns={AgNP48h_DEG.columns[0]: 'Entrez.Gene'}, inplace=True)
AgNP48h_DEG['Entrez.Gene'] = AgNP48h_DEG['Entrez.Gene'].astype(str)
Step 60. Here, the results of the DEG table are integrated into the mergeddataframe dataframe. This is followed by adjustment of the dataframe columns to remove non-relevant columns.
merged_dataframe_DEG_AgNP_48h= pd.merge(mergeddataframe,AgNP48h_DEG, on='Entrez.Gene')
merged_dataframe_DEG_AgNP_48h
Step 61. The following for loop for the key events to retrieve the n variable. It is comparable to the for loop of N, but adds a condition to check for significance of genes by p adjusted value being smaller than 0.05.
variable_n_dictionary_count4= {}
for index, row in merged_dataframe_DEG_AgNP_48h.iterrows():
unique_KE = row['KEID']
gene_expression_value = row['adj. p-value']
if gene_expression_value < 0.05:
if unique_KE not in variable_n_dictionary_count4:
variable_n_dictionary_count4[unique_KE] = 1
else:
variable_n_dictionary_count4[unique_KE] += 1
print("The total number of significant genes: ")
Step 62. The output of the n variable dictionary is saved as a dataframe and integrated as a separate column into a dataframe.
n_variable_dataframe4=pd.DataFrame.from_dict(variable_n_dictionary_count4,orient='index')
n_variable_dataframe4_reset = n_variable_dataframe4.reset_index()
n_variable_dataframe4_reset.columns = ['KEID', 'n']
merged_dataframe3= pd.merge(mergeddataframeDEG, n_variable_dataframe4_reset, on='KEID')
Section 8.2. Calculation of variable B and variable b.
In this section, variable B and variable b are calculated.
Step 63. Variable B is calculated by taking the length of the dataframe which includes all genes in 1 DEG table.
B=len(AgNP_48H_DEG.index)
B
20518
Step 64. Variable b is calculated by taking the length of the dataframe which includes all genes in 1 DEG table with the condition for significance.
AgNP_48H_DEG_filtered=AgNP_48H_DEG[AgNP_48H_DEG['adj. p-value'] < 0.05]
b=len(AgNP_48H_DEG_filtered)
b
2319
Section 8.3. Calculation of enrichment score and hypergeometric p-value
In this section, the enrichment score and hypergeometric p-value will be calculated. This requires the four variables of the enrichment score per KE for which the formula will be applied to and stored in an additional dataframe.
Step 65. The final dataframe will be created that contains the KEID and the four variables: variable N, variable n, variable B and variable b.
Final_dataframe_ES= merged_dataframe3.loc[:, ['KEID','N','n']]
Final_dataframe_ES['B']=pd.Series([20518 for x in range(len(Final_dataframe_ES.index))])
Final_dataframe_ES['b']=pd.Series([2319 for x in range(len(Final_dataframe_ES.index))])
Final_Dataframe_ES=Final_dataframe_ES.drop_duplicates(subset=['KEID'],keep='first')
Final_Dataframe_ES.reset_index(drop=True,inplace=True)
Copy_Final_DataFrame_ES=Final_Dataframe_ES.copy()
Step 66. The follow for loop will be used to calculate the enrichment score for individual key events and the results will be saved as a separate column into the dataframe.
def calculate_Enrichment_Score(row):
return f"{(row['n']/row['N'])/(row['b']/row['B'])}"
Copy_Final_DataFrame_ES.loc[:,'Enrichmentscore']= Copy_Final_DataFrame_ES.apply(calculate_Enrichment_Score,axis=1)
Copy_Final_DataFrame_ES
KEID | N | n | B | b | Enrichmentscore | |
---|---|---|---|---|---|---|
0 | https://identifiers.org/aop.events/1495 | 253 | 36 | 20518 | 2319 | 1.2589725365472033 |
1 | https://identifiers.org/aop.events/1668 | 156 | 19 | 20518 | 2319 | 1.0776141351820523 |
2 | https://identifiers.org/aop.events/244 | 417 | 70 | 20518 | 2319 | 1.4852387171763237 |
3 | https://identifiers.org/aop.events/41 | 275 | 53 | 20518 | 2319 | 1.7052083578344897 |
4 | https://identifiers.org/aop.events/1539 | 170 | 20 | 20518 | 2319 | 1.0409152017857595 |
5 | https://identifiers.org/aop.events/618 | 240 | 19 | 20518 | 2319 | 0.700449187868334 |
6 | https://identifiers.org/aop.events/1497 | 528 | 70 | 20518 | 2319 | 1.1730010323153919 |
7 | https://identifiers.org/aop.events/1115 | 34 | 8 | 20518 | 2319 | 2.081830403571519 |
8 | https://identifiers.org/aop.events/1917 | 166 | 32 | 20518 | 2319 | 1.7055959932875098 |
9 | https://identifiers.org/aop.events/1633 | 1056 | 140 | 20518 | 2319 | 1.1730010323153919 |
10 | https://identifiers.org/aop.events/1392 | 102 | 24 | 20518 | 2319 | 2.081830403571519 |
11 | https://identifiers.org/aop.events/1582 | 51 | 10 | 20518 | 2319 | 1.7348586696429327 |
12 | https://identifiers.org/aop.events/1896 | 205 | 37 | 20518 | 2319 | 1.596916248593275 |
13 | https://identifiers.org/aop.events/265 | 268 | 30 | 20518 | 2319 | 0.9904230464752564 |
14 | https://identifiers.org/aop.events/1750 | 528 | 70 | 20518 | 2319 | 1.1730010323153919 |
15 | https://identifiers.org/aop.events/1848 | 195 | 26 | 20518 | 2319 | 1.179703895357194 |
16 | https://identifiers.org/aop.events/890 | 34 | 8 | 20518 | 2319 | 2.081830403571519 |
17 | https://identifiers.org/aop.events/149 | 1056 | 140 | 20518 | 2319 | 1.1730010323153919 |
18 | https://identifiers.org/aop.events/1579 | 353 | 34 | 20518 | 2319 | 0.8521940320568966 |
19 | https://identifiers.org/aop.events/249 | 34 | 8 | 20518 | 2319 | 2.081830403571519 |
20 | https://identifiers.org/aop.events/288 | 51 | 10 | 20518 | 2319 | 1.7348586696429327 |
21 | https://identifiers.org/aop.events/209 | 617 | 93 | 20518 | 2319 | 1.3336198817044458 |
22 | https://identifiers.org/aop.events/1945 | 1218 | 134 | 20518 | 2319 | 0.9734009974006406 |
23 | https://identifiers.org/aop.events/1087 | 528 | 70 | 20518 | 2319 | 1.1730010323153919 |
24 | https://identifiers.org/aop.events/1538 | 34 | 8 | 20518 | 2319 | 2.081830403571519 |
25 | https://identifiers.org/aop.events/341 | 10 | 1 | 20518 | 2319 | 0.8847779215178957 |
26 | https://identifiers.org/aop.events/1090 | 459 | 49 | 20518 | 2319 | 0.9445341645833745 |
27 | https://identifiers.org/aop.events/352 | 398 | 47 | 20518 | 2319 | 1.0448382490286707 |
Step 67. The following for loop will be used to calculate the hypergeometric p-value for individual Key Events and save the result as a separate column into the dataframe. This requires some in between steps for manipulation of the dataframe.
p_value_dataframe4=[]
for index, row in Copy_Final_DataFrame_ES.iterrows():
M = row['B']
n = row['b']
N = row['N']
k = row['n']
hpd = ss.hypergeom(M, n, N)
p = hpd.pmf(k)
p_value_dataframe4.append(p)
Hypergeometricpvalue_dataframe4=pd.DataFrame(p_value_dataframe4)
Hypergeometricpvalue_dataframe4.columns= ['Hypergeometric p-value']
merged_finaltable4=pd.concat([Copy_Final_DataFrame_ES,Hypergeometricpvalue_dataframe4],axis=1)
Section 8.4. Filtering the results for significant KEs and calculation of percent gene overlap to ORA
In this section, the results will be filtered to only include significant KEs. Significant KEs have an enrichment score above 1 and a hypergeometric p-value below 0.05.
Section 8.4.1 Creation of the significant KEs table
In this section, you merge the dataframes to retrieve the genes connected to only the significant KEs.
Step 85. The significant KE table is created using the significan KEs from the previous merggeddataframe_final.
filteredversion_C5= merged_finaltable4[(merged_finaltable4['Enrichmentscore']>str(1))& (merged_finaltable4['Hypergeometric p-value'] < 0.05)]
SignificantKE_list5=filteredversion_C5['KEID'].tolist()
significantKEID_genetable5= mergeddataframe_final[mergeddataframe_final['KEID'].isin(SignificantKE_list5)]
significantKEID_genetable5
KEID | WPtitle | ID | gene | Entrez.Gene | |
---|---|---|---|---|---|
482 | https://identifiers.org/aop.events/1495 | Cytosolic DNA-sensing pathway | WP4655 | TREX1 | 11277 |
483 | https://identifiers.org/aop.events/1495 | Cytosolic DNA-sensing pathway | WP4655 | IFNA5 | 3442 |
484 | https://identifiers.org/aop.events/1495 | Cytosolic DNA-sensing pathway | WP4655 | IFNA1 | 3439 |
485 | https://identifiers.org/aop.events/1495 | Cytosolic DNA-sensing pathway | WP4655 | IFNA2 | 3440 |
486 | https://identifiers.org/aop.events/1495 | Cytosolic DNA-sensing pathway | WP4655 | IFNA4 | 3441 |
... | ... | ... | ... | ... | ... |
18889 | https://identifiers.org/aop.events/1538 | Oxidative stress response | WP408 | TXNRD2 | 10587 |
18890 | https://identifiers.org/aop.events/1538 | Oxidative stress response | WP408 | MT1X | 4501 |
18891 | https://identifiers.org/aop.events/1538 | Oxidative stress response | WP408 | NOX1 | 27035 |
18892 | https://identifiers.org/aop.events/1538 | Oxidative stress response | WP408 | NFIX | 4784 |
18893 | https://identifiers.org/aop.events/1538 | Oxidative stress response | WP408 | NOX3 | 50508 |
5969 rows × 5 columns
Section 8.4.2 Significant ORA pathway table plus splitting
In this section, the significant ORA pathway table is created.
Step 86. The significant ORA pathway table is created using the significant enriched patwhays identified from the ORA analysis. This requires data manipulation to restructure the table in a way that the individual genes for the enriched pathways are placed on individual rows.
datafile_ORA5 = pd.read_csv("C:/Users/shaki/Downloads/ORA_tables_for_comparison/Comparison 4-AgNP-48H.txt", sep='\t')
datafileORA5=pd.DataFrame(datafile_ORA5)
filtereddatafileORA_5=datafileORA5[datafileORA5['Adjusted P-value'] < 0.05]
dropped_datafileORA_df5=filtereddatafileORA_5.drop(['Adjusted P-value','Odds Ratio','Old P-value','Gene_set','P-value','Old adjusted P-value','Combined Score'],axis=1)
droppeddatafileORAdf5=dropped_datafileORA_df5.copy()
droppeddatafileORAdf5['Genes']= droppeddatafileORAdf5['Genes'].replace({';':','},regex=True)
df5_ORApathwaytable=droppeddatafileORAdf5.copy()
df5_ORApathwaytable['Genes'] = df5_ORApathwaytable['Genes'].astype(str)
df5_ORApathwaytable['Genes'] = df5_ORApathwaytable['Genes'].str.split(',')
exploded_df5_ORApathwaytable = df5_ORApathwaytable.explode('Genes', ignore_index=True)
Section 8.4.3 For loop to get overlapping genes
In this section, the number of overlapping genes between the significant enrichment score-based Key Events and enriched pathways from ORA are calculated.
Step 87. Next, two sets are created by converting the significant KE table and ORA pathway table into dictionaries where the values of the genes are grouped together per key. This is followed by running a for loop to calculate the number of overlapping genes along with the symbols.
ORA_gene_sets5 = exploded_df5_ORApathwaytable.groupby('Term')['Genes'].apply(set).to_dict()
SignificantKE_gene_sets5 = significantKEID_genetable5.groupby('KEID')['gene'].apply(set).to_dict()
overlapping_genes_betweenORA_and_significantKEs5 = {}
for term, ORA_genes in ORA_gene_sets5.items():
for KEID, KEID_genes in SignificantKE_gene_sets5.items():
overlap = ORA_genes.intersection(KEID_genes)
print(f"{term} x {KEID}: {len(overlap)} overlaps")
overlapping_genes_betweenORA_and_significantKEs5[(term, KEID)] = {
'overlapping genes': overlap,
'number of genes that overlap': len(overlap)
}
if overlapping_genes_betweenORA_and_significantKEs5:
print("\ntitle of Overlapping Gene(s) and the number between enriched pathways from ORA and significant KEs:")
for (term, KEID), result in overlapping_genes_betweenORA_and_significantKEs5.items():
print(f"Term: {term}, KEID: {KEID}, Title of overlapping gene(s): {result['overlapping genes']}, number: {result['number of genes that overlap']}")
else:
print("No overlapping genes")
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1087: 3 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1115: 2 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1392: 2 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/149: 3 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1495: 2 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1497: 3 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1538: 2 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1582: 3 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1633: 3 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1750: 3 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1896: 2 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/1917: 1 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/209: 5 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/244 : 5 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/249: 2 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/288: 0 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/41: 3 overlaps
Copper Homeostasis WP3286 x https://identifiers.org/aop.events/890: 2 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1087: 5 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1115: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1392: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/149: 5 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1495: 11 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1497: 5 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1538: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1582: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1633: 5 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1750: 5 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1896: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/1917: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/209: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/244 : 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/249: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/288: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/41: 0 overlaps
Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865 x https://identifiers.org/aop.events/890: 0 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1087: 4 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1115: 5 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1392: 5 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/149: 4 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1495: 1 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1497: 4 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1538: 5 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1582: 1 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1633: 4 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1750: 4 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1896: 2 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/1917: 29 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/209: 34 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/244 : 31 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/249: 5 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/288: 8 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/41: 34 overlaps
Nuclear Receptors Meta Pathway WP2882 x https://identifiers.org/aop.events/890: 5 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1087: 1 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1115: 3 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1392: 3 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/149: 1 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1495: 1 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1497: 1 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1538: 3 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1582: 0 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1633: 1 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1750: 1 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1896: 0 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/1917: 4 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/209: 7 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/244 : 5 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/249: 3 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/288: 0 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/41: 4 overlaps
Selenium Metabolism And Selenoproteins WP28 x https://identifiers.org/aop.events/890: 3 overlaps
title of Overlapping Gene(s) and the number between enriched pathways from ORA and significant KEs:
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'TP53', 'JUN', 'AKT1'}, number: 3
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): {'SP1', 'MT1X'}, number: 2
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'SP1', 'MT1X'}, number: 2
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'TP53', 'JUN', 'AKT1'}, number: 3
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'JUN', 'AKT1'}, number: 2
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'TP53', 'JUN', 'AKT1'}, number: 3
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): {'SP1', 'MT1X'}, number: 2
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): {'GSK3B', 'AKT1', 'APC'}, number: 3
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'TP53', 'JUN', 'AKT1'}, number: 3
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'TP53', 'JUN', 'AKT1'}, number: 3
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1896, Title of overlapping gene(s): {'TP53', 'AKT1'}, number: 2
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): {'GSK3B'}, number: 1
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'APC', 'SP1', 'GSK3B', 'JUN', 'MT1X'}, number: 5
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/244 , Title of overlapping gene(s): {'TP53', 'AKT1', 'APC', 'GSK3B', 'JUN'}, number: 5
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): {'SP1', 'MT1X'}, number: 2
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/288, Title of overlapping gene(s): set(), number: 0
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'TP53', 'GSK3B', 'AKT1'}, number: 3
Term: Copper Homeostasis WP3286, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): {'SP1', 'MT1X'}, number: 2
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'FADD', 'TRAF6', 'CXCL8', 'CHUK', 'IKBKG'}, number: 5
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'FADD', 'TRAF6', 'CXCL8', 'CHUK', 'IKBKG'}, number: 5
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'FADD', 'RNF125', 'ATG12', 'TRADD', 'MAVS', 'TRAF6', 'ATG5', 'CXCL8', 'CHUK', 'IKBKG', 'CYLD'}, number: 11
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'FADD', 'TRAF6', 'CXCL8', 'CHUK', 'IKBKG'}, number: 5
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'FADD', 'TRAF6', 'CXCL8', 'CHUK', 'IKBKG'}, number: 5
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'FADD', 'TRAF6', 'CXCL8', 'CHUK', 'IKBKG'}, number: 5
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1896, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/244 , Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/288, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): set(), number: 0
Term: Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): set(), number: 0
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'TGFBR2', 'TGFB2', 'HSPA1A', 'JUN'}, number: 4
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): {'NFE2L2', 'HMOX1', 'GPX3', 'CYP1A1', 'SP1'}, number: 5
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'NFE2L2', 'HMOX1', 'GPX3', 'CYP1A1', 'SP1'}, number: 5
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'TGFBR2', 'TGFB2', 'HSPA1A', 'JUN'}, number: 4
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'JUN'}, number: 1
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'TGFBR2', 'TGFB2', 'HSPA1A', 'JUN'}, number: 4
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): {'NFE2L2', 'HMOX1', 'GPX3', 'CYP1A1', 'SP1'}, number: 5
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): {'SRC'}, number: 1
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'TGFBR2', 'TGFB2', 'HSPA1A', 'JUN'}, number: 4
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'TGFBR2', 'TGFB2', 'HSPA1A', 'JUN'}, number: 4
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1896, Title of overlapping gene(s): {'POLK', 'GADD45B'}, number: 2
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): {'GPX3', 'SLC5A3', 'ABCC4', 'TGFBR2', 'SRC', 'SLC6A15', 'HMOX1', 'GPX2', 'TGFA', 'DNAJB1', 'ABCC3', 'RXRA', 'GSTM3', 'TXNRD3', 'SLC2A1', 'SLC2A8', 'MAFF', 'SLC39A10', 'HSPA1A', 'GGT1', 'SLC6A9', 'ALDH3A1', 'SLC2A6', 'NFE2L2', 'SLC2A3', 'CES2', 'TGFB2', 'SLC2A14', 'HBEGF'}, number: 29
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'CPT1A', 'GPX3', 'SP1', 'SLC5A3', 'ABCC4', 'TGFBR2', 'SLC6A15', 'HMOX1', 'GPX2', 'TGFA', 'DNAJB1', 'SCD', 'CYP1A1', 'JUN', 'ABCC3', 'RXRA', 'GSTM3', 'TXNRD3', 'SLC2A1', 'SLC2A8', 'MAFF', 'SLC39A10', 'HSPA1A', 'GGT1', 'SLC6A9', 'ALDH3A1', 'SLC2A6', 'NFE2L2', 'POLK', 'SLC2A3', 'CES2', 'TGFB2', 'SLC2A14', 'HBEGF'}, number: 34
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/244 , Title of overlapping gene(s): {'GADD45B', 'GPX3', 'SLC5A3', 'ABCC4', 'TGFBR2', 'SRC', 'SLC6A15', 'HMOX1', 'GPX2', 'TGFA', 'DNAJB1', 'JUN', 'ABCC3', 'RXRA', 'GSTM3', 'TXNRD3', 'SLC2A1', 'SLC2A8', 'MAFF', 'SLC39A10', 'HSPA1A', 'GGT1', 'SLC6A9', 'ALDH3A1', 'SLC2A6', 'NFE2L2', 'SLC2A3', 'CES2', 'TGFB2', 'SLC2A14', 'HBEGF'}, number: 31
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): {'NFE2L2', 'HMOX1', 'GPX3', 'CYP1A1', 'SP1'}, number: 5
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/288, Title of overlapping gene(s): {'ABCB1', 'ABCC4', 'BAAT', 'SRC', 'RXRA', 'PPARGC1A', 'CYP4F12', 'ABCC3'}, number: 8
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'CPT1A', 'GPX3', 'SLC5A3', 'ABCC4', 'TGFBR2', 'SRC', 'SLC6A15', 'HMOX1', 'GPX2', 'TGFA', 'DNAJB1', 'ABCC3', 'RXRA', 'GSTM3', 'TXNRD3', 'SLC2A1', 'SREBF1', 'PPARGC1A', 'SLC2A8', 'MAFF', 'SLC39A10', 'HSPA1A', 'GGT1', 'SLC6A9', 'ALDH3A1', 'SLC2A6', 'NFE2L2', 'BAAT', 'SLC2A3', 'CES2', 'FASN', 'TGFB2', 'SLC2A14', 'HBEGF'}, number: 34
Term: Nuclear Receptors Meta Pathway WP2882, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): {'NFE2L2', 'HMOX1', 'GPX3', 'CYP1A1', 'SP1'}, number: 5
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1087, Title of overlapping gene(s): {'JUN'}, number: 1
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1115, Title of overlapping gene(s): {'SP1', 'NFE2L2', 'GPX3'}, number: 3
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1392, Title of overlapping gene(s): {'SP1', 'NFE2L2', 'GPX3'}, number: 3
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/149, Title of overlapping gene(s): {'JUN'}, number: 1
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1495, Title of overlapping gene(s): {'JUN'}, number: 1
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1497, Title of overlapping gene(s): {'JUN'}, number: 1
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1538, Title of overlapping gene(s): {'SP1', 'NFE2L2', 'GPX3'}, number: 3
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1582, Title of overlapping gene(s): set(), number: 0
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1633, Title of overlapping gene(s): {'JUN'}, number: 1
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1750, Title of overlapping gene(s): {'JUN'}, number: 1
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1896, Title of overlapping gene(s): set(), number: 0
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/1917, Title of overlapping gene(s): {'NFE2L2', 'TXNRD3', 'GPX2', 'GPX3'}, number: 4
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/209, Title of overlapping gene(s): {'NFE2L2', 'GPX4', 'TXNRD3', 'GPX2', 'GPX3', 'SP1', 'JUN'}, number: 7
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/244 , Title of overlapping gene(s): {'NFE2L2', 'TXNRD3', 'GPX2', 'GPX3', 'JUN'}, number: 5
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/249, Title of overlapping gene(s): {'SP1', 'NFE2L2', 'GPX3'}, number: 3
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/288, Title of overlapping gene(s): set(), number: 0
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/41, Title of overlapping gene(s): {'NFE2L2', 'TXNRD3', 'GPX2', 'GPX3'}, number: 4
Term: Selenium Metabolism And Selenoproteins WP28, KEID: https://identifiers.org/aop.events/890, Title of overlapping gene(s): {'SP1', 'NFE2L2', 'GPX3'}, number: 3
Section 8.4.4 Tabulation gene overlap
In this section, a table is created that contains the number of overlapping genes and number of total genes in preparation for section 7.5.5.
final_geneoverlaptable_C5=pd.DataFrame.from_dict(overlapping_genes_betweenORA_and_significantKEs5,orient='index')
Section 8.4.5 Percent overlap calculation
In this section, the percent overlap for the genesets are calculated.
Step 57. Lastly, the percent overlap is calculated and add the result as a column to the dataframe. This is first done by running a for loop to calculate the total number of genes belonging to the enriched pathways of ORA.
variable_count5= {}
for index, row in exploded_df5_ORApathwaytable.iterrows():
unique_KE = row['Term']
gene_expression_value = row['Genes']
if unique_KE not in variable_count5:
variable_count5[unique_KE] = 1
else:
variable_count5[unique_KE] += 1
print("The total number of genes: ")
print(variable_count5)
The total number of genes:
{'Copper Homeostasis WP3286': 18, 'Nuclear Receptors Meta Pathway WP2882': 61, 'Selenium Metabolism And Selenoproteins WP28': 15, 'Novel Intracellular Components Of RIG I Like Receptor Pathway WP3865': 17}
Step 88. The result is converted into a dataframe and added to the final dataframe. This is followed by some data manipulation prior to calculation of gene set overlap.
variable_count_df5=pd.DataFrame.from_dict(variable_count5,orient='index')
reset_variable_count_df5 = variable_count_df5.reset_index()
reset_variable_count_df5.columns = ['Term', 'Total number of genes']
Genesetoverlaptable_C5=final_geneoverlaptable_C5.reset_index(level=[1])
Genesetoverlaptable_C5.reset_index(inplace=True)
Genesetoverlaptable_C5.columns= ['Term','KEID','overlapping genes','number of genes that overlap']
tabulation_C5=pd.merge(reset_variable_count_df5,Genesetoverlaptable_C5, on='Term')
def calculate_Genesetoverlap_Score(row):
return f"{(row['number of genes that overlap']/row['Total number of genes'])*100}"
tabulation_C5.loc[:,'Percent geneset overlap']= tabulation_C5.apply(calculate_Genesetoverlap_Score, axis=1)
tabulation_C5.to_excel('genesetoverlap-AgNP48h.xlsx')
Section 9: Metadata
Step 89. At last, the metadata belonging to this Jupyter Notebook is displayed which contains the version numbers of packages and system-set-up for interested users. This requires the usage of packages:Watermark and print_versions.
%load_ext watermark
!pip install print-versions
Requirement already satisfied: print-versions in c:\users\shaki\anaconda3\lib\site-packages (0.1.0)
%watermark
Last updated: 2025-06-02T19:48:45.303185+02:00
Python implementation: CPython
Python version : 3.12.3
IPython version : 8.25.0
Compiler : MSC v.1938 64 bit (AMD64)
OS : Windows
Release : 11
Machine : AMD64
Processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
CPU cores : 8
Architecture: 64bit
from print_versions import print_versions
print_versions(globals())
pandas==2.2.3
json==2.0.9
ipykernel==6.28.0
numpy==1.26.4
scipy==1.13.1
py4cytoscape==1.9.0
References:
- Martens M, Meuleman AB, Kearns J, de Windt C, Evelo CT, Willighagen EL. Molecular Adverse Outcome Pathways: towards the implementation of transcriptomics data in risk assessments. bioRxiv. 2023:2023.03.02.530766.
- How can I iterate over rows in a Pandas DataFrame?[Internet]. Stack Overflow. Available from: https://stackoverflow.com/questions/16476924/how-can-i-iterate-over-rows-in-a-pandas-dataframe
- Python - Loop Dictionaries \[Internet\]. www.w3schools.com. Available from: https://www.w3schools.com/python/python_dictionaries_loop.asp
- Priya. apply(set) to two columns in a pandas dataframe [Internet]. Stack Overflow. 2018. Available from: https://stackoverflow.com/questions/52367388/applyset-to-two-columns-in-a-pandas-dataframe
- amnesic. Converting pandas dataframe to dictionary with same keys over multiple rows [Internet]. Stack Overflow. 2022. Available from: https://stackoverflow.com/questions/71006325/converting-pandas-dataframe-to-dictionary-with-same-keys-over-multiple-rows/71006478#71006478
- SuperDougDougy. GroupBy results to dictionary of lists [Internet]. Stack Overflow. 2015. Available from: https://stackoverflow.com/questions/29876184/groupby-results-to-dictionary-of-lists%E2%80%8C