Part 1: Construction of the preliminary AOP table for AOP networks

The AOP project ► Key objective 1

Author: Shakira Agata

This Jupyter Notebook describes the steps needed to create the preliminary table that is needed for construction of an AOP network that focuses on inflammatory processes in human organ systems. This notebook focuses on the organ systems: brain, liver, kidney and lung due to research interests of the author. The preliminary table will contain the following information: AOP (adverse outcome pathway), AOP title, KE name (key event name), AO (adverse outcome), AO title, KER (key event relationship), KER ID and title of the organ system. To achieve this result, three SPARQLqueries will be executed against AOP-Wiki RDF to extract the AOPs that are related to inflammatory processes along with their respective upstreamKEs, downstreamKEs and KERs. The detailed steps are outlined in the following eight sections:

  • Section 1: System preparation for generation of AOP-Wiki RDF SPARQL queries
  • Section 2: Execution of the first SPARQL query
  • Section 3: Merging of two datasets
  • Section 4: Filtering of results
  • Section 5: Execution of second SPARQL query
  • Section 6: Execution of third SPARQL query
  • Section 7: Merging results from section 4-6
  • Section 8: Metadata

Section 1: System preparation for generation of AOP-Wiki RDF SPARQL queries

In section 1 and section 2, the system requirements for this notebook will be fulfilled followed by the generation of the first SPARQL query. This query is run against AOP-Wiki RDF.

Step 1: Install SPARQLWRAPPER which is the Python wrapper you need to be able to run the query.

pip install sparqlwrapper
Requirement already satisfied: sparqlwrapper in c:\users\shaki\anaconda3\lib\site-packages (2.0.0)
Requirement already satisfied: rdflib>=6.1.1 in c:\users\shaki\anaconda3\lib\site-packages (from sparqlwrapper) (7.0.0)
Requirement already satisfied: isodate<0.7.0,>=0.6.0 in c:\users\shaki\anaconda3\lib\site-packages (from rdflib>=6.1.1->sparqlwrapper) (0.6.1)
Requirement already satisfied: pyparsing<4,>=2.1.0 in c:\users\shaki\anaconda3\lib\site-packages (from rdflib>=6.1.1->sparqlwrapper) (3.0.9)
Requirement already satisfied: six in c:\users\shaki\anaconda3\lib\site-packages (from isodate<0.7.0,>=0.6.0->rdflib>=6.1.1->sparqlwrapper) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

Step 2: Import sys, sparqlwrapper and pandas which are packages that allow you to interact with variables and functions and manipulate data. For the usage of Pandas, the maximum column width is set to ´None´ as Pandas version 2.2.2 does not allow for non-negative integer to be set.

import sys

!{sys.executable} -m pip install watermark
from SPARQLWrapper import SPARQLWrapper, JSON


import pandas as pd

pd.set_option('display.max_colwidth', None)
Requirement already satisfied: watermark in c:\users\shaki\anaconda3\lib\site-packages (2.4.3)
Requirement already satisfied: ipython>=6.0 in c:\users\shaki\anaconda3\lib\site-packages (from watermark) (8.25.0)
Requirement already satisfied: importlib-metadata>=1.4 in c:\users\shaki\anaconda3\lib\site-packages (from watermark) (7.0.1)
Requirement already satisfied: setuptools in c:\users\shaki\anaconda3\lib\site-packages (from watermark) (69.5.1)
Requirement already satisfied: zipp>=0.5 in c:\users\shaki\anaconda3\lib\site-packages (from importlib-metadata>=1.4->watermark) (3.17.0)
Requirement already satisfied: decorator in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (5.1.1)
Requirement already satisfied: jedi>=0.16 in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (0.18.1)
Requirement already satisfied: matplotlib-inline in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (0.1.6)
Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (3.0.43)
Requirement already satisfied: pygments>=2.4.0 in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (2.15.1)
Requirement already satisfied: stack-data in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (0.2.0)
Requirement already satisfied: traitlets>=5.13.0 in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (5.14.3)
Requirement already satisfied: colorama in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (0.4.6)
Requirement already satisfied: wcwidth in c:\users\shaki\anaconda3\lib\site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython>=6.0->watermark) (0.2.5)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in c:\users\shaki\anaconda3\lib\site-packages (from jedi>=0.16->ipython>=6.0->watermark) (0.8.3)
Requirement already satisfied: executing in c:\users\shaki\anaconda3\lib\site-packages (from stack-data->ipython>=6.0->watermark) (0.8.3)
Requirement already satisfied: asttokens in c:\users\shaki\anaconda3\lib\site-packages (from stack-data->ipython>=6.0->watermark) (2.0.5)
Requirement already satisfied: pure-eval in c:\users\shaki\anaconda3\lib\site-packages (from stack-data->ipython>=6.0->watermark) (0.2.2)
Requirement already satisfied: six in c:\users\shaki\anaconda3\lib\site-packages (from asttokens->stack-data->ipython>=6.0->watermark) (1.16.0)

Step 3: Create the variable: AOPWikiSPARQL for the SPARQL wrapper and set the endpoint to JSON to ensure results are displayed in human-readable format.

AOPWikiSPARQL = SPARQLWrapper("https://aopwiki.rdf.bigcat-bioinformatics.org/sparql/")
AOPWikiSPARQL.setReturnFormat(JSON) 

Step 4: Define the triple, coretypes, ontologies and identifiers which are used to codify the semantic data in AOP-Wiki.

triple = ['subject','predicate','object']
coretypes = ['aopo:AdverseOutcomePathway','aopo:KeyEvent','aopo:KeyEventRelationship','ncbitaxon:131567','go:0008150','pato:0001241','pato:0000001','aopo:CellTypeContext','aopo:OrganContext','nci:C54571','cheminf:000000'] 
ontologies = ['http://aopkb.org/aop_ontology#','http://edamontology.org/','http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#','http://purl.bioontology.org/ontology/NCBITAXON/','http://purl.obolibrary.org/obo/MMO','http://purl.obolibrary.org/obo/CL_','http://purl.obolibrary.org/obo/UBERON_','http://purl.obolibrary.org/obo/MI_','http://purl.obolibrary.org/obo/MP_','http://purl.org/commons/record/mesh/','http://purl.obolibrary.org/obo/HP_','http://purl.obolibrary.org/obo/PCO_','http://purl.obolibrary.org/obo/NBO_','http://purl.obolibrary.org/obo/VT_','http://purl.obolibrary.org/obo/PR_','http://purl.obolibrary.org/obo/CHEBI_','http://purl.org/sig/ont/fma/fma','http://xmlns.com/foaf/0.1/','http://www.w3.org/2004/02/skos/core#','http://www.w3.org/2000/01/rdf-schema#','http://www.w3.org/1999/02/22-rdf-syntax-ns#','http://semanticscience.org/resource/CHEMINF_','http://purl.obolibrary.org/obo/GO_','http://purl.org/dc/terms/','http://purl.org/dc/elements/1.1/','http://purl.obolibrary.org/obo/PATO_']
identifiers = []

Section 2: Execution of the first SPARQL query

Step 5: Define and run the SPARQL query to extract the inflammatory-related AOPs. Subsequently, convert the results to a pandas dataframe (df1) to display the results.

first_sparqlquery= '''SELECT DISTINCT ?AOP ?AOPtitle ?KE ?KEname
WHERE {
?KE a aopo:KeyEvent ;
dc:identifier ?KElookup ;
dc:title ?KEname .
?AOP a aopo:AdverseOutcomePathway ;
aopo:has_key_event ?KElookup ;
dc:title ?AOPtitle .

FILTER regex(?KEname, "inflammation|inflammatory", "i")
}
ORDER BY DESC(?AOP)'''

AOPWikiSPARQL.setQuery(first_sparqlquery)
first_results = AOPWikiSPARQL.query().convert()

first_data= first_results["results"]["bindings"]

columns = [{
    "AOP": item["AOP"]["value"],
    "AOPtitle": item["AOPtitle"]["value"],
    "KE": item["KE"]["value"],
    "KEname": item["KEname"]["value"]
    
} for item in first_data]

df1= pd.DataFrame(columns)
display(df1)

AOP AOPtitle KE KEname
0 https://identifiers.org/aop/62 AKT2 activation leading to hepatic steatosis https://identifiers.org/aop.events/486 systemic inflammation leading to hepatic steatosis
1 https://identifiers.org/aop/544 Inhibition of neuropathy target esterase leading to delayed neuropathy via increased inflammation https://identifiers.org/aop.events/149 Increase, Inflammation
2 https://identifiers.org/aop/535 Binding and activation of GPER leading to learning and memory impairments https://identifiers.org/aop.events/188 Neuroinflammation
3 https://identifiers.org/aop/511 The AOP framework on ROS-mediated oxidative stress induced vascular disrupting effects https://identifiers.org/aop.events/2009 Activation of inflammation pathway
4 https://identifiers.org/aop/507 Nrf2 inhibition leading to vascular disrupting effects via inflammation pathway https://identifiers.org/aop.events/2009 Activation of inflammation pathway
... ... ... ... ...
67 https://identifiers.org/aop/144 Endocytic lysosomal uptake leading to liver fibrosis https://identifiers.org/aop.events/1493 Increased Pro-inflammatory mediators
68 https://identifiers.org/aop/14 Glucocorticoid Receptor Activation Leading to Increased Disease Susceptibility https://identifiers.org/aop.events/152 Suppression, Inflammatory cytokines
69 https://identifiers.org/aop/12 Chronic binding of antagonist to N-methyl-D-aspartate receptors (NMDARs) during brain development leads to neurodegeneration with impairment in learning and memory in aging https://identifiers.org/aop.events/188 Neuroinflammation
70 https://identifiers.org/aop/115 Epithelial cytotoxicity leading to forestomach tumors (in mouse and rat) https://identifiers.org/aop.events/149 Increase, Inflammation
71 https://identifiers.org/aop/114 HPPD inhibition leading to corneal papillomas and carcinomas (in rat) https://identifiers.org/aop.events/777 Increase, Inflammation (corneal cells)

72 rows × 4 columns

Section 3: Merging of datasets

In this section, you will merge two datafiles: df1 which contains the previous SPARQL query results for inflammatory processes-related AOPs and table2 which is a snorql-csv file that contains additional AOPs that was created by Marvin Martens (supervisor of Shakira Agata). The snorql-csv file has five columns: AOP, AOPName, ao, aotitle and organ which were merged with df1.

Step 6: You first read and convert the snorql-csv file into JSON format.

table2= pd.read_excel('snorql-csv-1679052766.894.xlsx', sheet_name='snorql-csv-1679052766.894')
json_table2= table2.to_json()

Step 7: This is followed by converting the JSON formatted file into a pandas dataframe and verifying the result.

df2= pd.read_json("C:/Users/shaki/Downloads/csvjson.json")

Step 8: Next, you merge the two dataframes based on the shared ‘AOP’ column.

mergedJSONtable= pd.merge(df1,df2, on='AOP')

Section 4: Filtering of results

In this section, you filter the result of section 3 by removing AOPs that are not organ-system based i.e belonging to brain, kidney, liver or lung.

Step 9: The rows in column:‘organ’ that contain ‘other’ are filtered so that you only retain AOPs in the defined four organ systems (brain, liver, kidney and lung). This is done by using boolean function:’!=’ to only include the rows where organ is NOT defined as ‘Other’.

filtered_mergedJSONtable = mergedJSONtable[mergedJSONtable['organ'] != 'Other']

Section 5: Execution of second SPARQL query

In this section, you will run the second SPARQL query to retrieve/define the upstreamKEs and downstreamKEs from the AOPs retrieved in section 2.

Step 10: In preparation for the second SPARQL query, you first retrieve the uri’s for the AOPs so that the execution of the query takes less time and can be automated. This will be done with the following function where we select the ‘AOP’ column from the filtered_mergedJSONtable that contains the uri’s in the filtered_mergedJSONtable and join them (all 54).

values_AOPs = " ".join(f"<{AOP}>" for AOP in filtered_mergedJSONtable['AOP'])
print(values_AOPs)
<https://identifiers.org/aop/62> <https://identifiers.org/aop/48> <https://identifiers.org/aop/452> <https://identifiers.org/aop/452> <https://identifiers.org/aop/451> <https://identifiers.org/aop/451> <https://identifiers.org/aop/447> <https://identifiers.org/aop/429> <https://identifiers.org/aop/409> <https://identifiers.org/aop/409> <https://identifiers.org/aop/382> <https://identifiers.org/aop/38> <https://identifiers.org/aop/374> <https://identifiers.org/aop/362> <https://identifiers.org/aop/320> <https://identifiers.org/aop/320> <https://identifiers.org/aop/319> <https://identifiers.org/aop/303> <https://identifiers.org/aop/3> <https://identifiers.org/aop/280> <https://identifiers.org/aop/278> <https://identifiers.org/aop/27> <https://identifiers.org/aop/206> <https://identifiers.org/aop/173> <https://identifiers.org/aop/173> <https://identifiers.org/aop/171> <https://identifiers.org/aop/17> <https://identifiers.org/aop/17> <https://identifiers.org/aop/144> <https://identifiers.org/aop/12>

Step 11: Now you run the second SPARQL query to retrieve the KE, KEID and KEtitle for the respective AOPs. To do this, you need to use one curly bracket for values_AOPs as that marks the position for uri insertion and two surrounding curly brackets for values_AOPs to prevent identification of the curly brackets as placeholders (escaping).

second_sparqlsquery = f'''
SELECT DISTINCT ?AOP ?KE ?KEID ?KEtitle
WHERE {{
VALUES ?AOP {{ {values_AOPs} }}
?AOP a aopo:AdverseOutcomePathway ;
dc:title ?AOPName ;
aopo:has_key_event ?KE .
?KE a aopo:KeyEvent ; rdfs:label ?KEID ; dc:title ?KEtitle .
}}
'''

AOPWikiSPARQL.setQuery(second_sparqlsquery)
second_results = AOPWikiSPARQL.query().convert()

second_data= second_results["results"]["bindings"]

columns = [{
    
    "AOP": item["AOP"]["value"],
    "KE": item["KE"]["value"],
    "KEID": item["KEID"]["value"],
    "KEtitle": item["KEtitle"]["value"] 
    
} for item in second_data]

df3= pd.DataFrame(columns)
display(df3)

AOP KE KEID KEtitle
0 https://identifiers.org/aop/447 https://identifiers.org/aop.events/105 KE 105 Inhibition, Mitochondrial Electron Transport Chain Complexes
1 https://identifiers.org/aop/27 https://identifiers.org/aop.events/1115 KE 1115 Increase, Reactive oxygen species
2 https://identifiers.org/aop/303 https://identifiers.org/aop.events/1115 KE 1115 Increase, Reactive oxygen species
3 https://identifiers.org/aop/319 https://identifiers.org/aop.events/1115 KE 1115 Increase, Reactive oxygen species
4 https://identifiers.org/aop/382 https://identifiers.org/aop.events/1115 KE 1115 Increase, Reactive oxygen species
... ... ... ... ...
185 https://identifiers.org/aop/62 https://identifiers.org/aop.events/459 KE 459 Increased, Liver Steatosis
186 https://identifiers.org/aop/62 https://identifiers.org/aop.events/484 KE 484 Activation, AKT2
187 https://identifiers.org/aop/62 https://identifiers.org/aop.events/486 KE 486 systemic inflammation leading to hepatic steatosis
188 https://identifiers.org/aop/447 https://identifiers.org/aop.events/759 KE 759 Increased, Kidney Failure
189 https://identifiers.org/aop/451 https://identifiers.org/aop.events/780 KE 780 Increase, Cytotoxicity (epithelial cells)

190 rows × 4 columns

Section 6: Execution of third SPARQL query

In this section, you will run the third SPARQL query to retrieve/define the KERs for the upstreamKEs and downstreamKEs of the AOPs. This approach allows for easier integration of the output into the nodetable and edgetable of the AOP network (1).

Step 12: You run the following query to retrieve the upstreamKEs, downstreamKEs and KERs.

third_sparqlsquery = f'''
SELECT DISTINCT ?AOP ?upstreamKE ?upstreamKEtitle ?downstreamKE ?downstreamKEtitle ?KER ?KERID
WHERE {{
VALUES ?AOP {{ {values_AOPs} }}
?AOP a aopo:AdverseOutcomePathway ;
dc:title ?AOPName ;
aopo:has_key_event_relationship ?KER .
?KER a aopo:KeyEventRelationship ;
rdfs:label ?KERID .

?KER aopo:has_upstream_key_event ?upstreamKE .
?upstreamKE dc:title ?upstreamKEtitle .

?KER aopo:has_downstream_key_event ?downstreamKE .
?downstreamKE dc:title ?downstreamKEtitle .
}}
'''
AOPWikiSPARQL.setQuery(third_sparqlsquery)
third_results = AOPWikiSPARQL.query().convert()

third_data= third_results["results"]["bindings"]

columns = [{
    "AOP": item["AOP"]["value"],
    "upstreamKE": item["upstreamKE"]["value"],
    "downstreamKE": item["downstreamKE"]["value"],
    "KER": item["KER"]["value"],
    "KERID": item["KERID"]["value"],
    "upstreamKEtitle": item["upstreamKEtitle"]["value"],
    "downstreamKEtitle": item["downstreamKEtitle"]["value"]
    
    
} for item in third_data]

df4= pd.DataFrame(columns)

Section 7: Merging results from section 4-6

In this section, you will merge the dataframes: filtered_mergedJSONtable (result of section 4), df3 (result of section 5) and df4 (result of section 6).

Step 13: The three dataframes are merged to get the final table that will be used to define the nodes and edges of the network in Py4Cytoscape. The final table will be the merged version of:

  • filtered_mergedJSONtable (AOP SPARQL query)
  • df3 (KE SPARQL query)
  • df4(KER/KERID SPARQL query).
first_merge = pd.merge(df3, df4)
final_result= pd.merge(filtered_mergedJSONtable,first_merge)

Step 14: Lastly the final_result table is stored in JSON-format in your laptop in preparation for the next Jupyter Notebook.

final_result.to_json('final_result.json')

Section 8: Metadata

Step 15: At last, the metadata belonging to this Jupyter Notebook is displayed which contains the version numbers of packages and system-set-up for interested users. This requires the usage of packages:Watermark and print_versions.

%load_ext watermark
!pip install print-versions
Requirement already satisfied: print-versions in c:\users\shaki\anaconda3\lib\site-packages (0.1.0)
%watermark
Last updated: 2025-04-26T20:54:28.526670+02:00

Python implementation: CPython
Python version       : 3.12.3
IPython version      : 8.25.0

Compiler    : MSC v.1938 64 bit (AMD64)
OS          : Windows
Release     : 11
Machine     : AMD64
Processor   : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
CPU cores   : 8
Architecture: 64bit
from print_versions import print_versions
print_versions(globals())
json==2.0.9
ipykernel==6.28.0
pandas==2.2.2
SPARQLWrapper==2.0.0

References:

  1. Martens, M., Evelo, C. T., & Willighagen, E. L. (2022). Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. Applied In Vitro Toxicology. https://github.com/marvinm2/AOPWikiRDF
  2. Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. Appl In Vitro Toxicol. 2022 Mar 1;8(1):2-13. doi: 10.1089/aivt.2021.0010. Epub 2022 Mar 17. PMID: 35388368; PMCID: PMC8978481.
  3. msx. Package for listing version of packages used in a Jupyter Notebook \[Internet\]. Stack Overflow. 2016. Available from: https://stackoverflow.com/questions/40428931/package-for-listing-version-of-packages-used-in-a-jupyter-notebook