Part 1: Construction of the preliminary AOP table for AOP networks

The AOP project ► Key objective 1

Author: Shakira Agata

This Jupyter Notebook describes the steps needed to create the preliminary table that is needed for construction of an AOP network that focuses on inflammatory processes in human organ systems. This notebook focuses on the organ systems: brain, liver, kidney and lung due to research interests of the author. The preliminary table will contain the following information: AOP (adverse outcome pathway), AOP title, KE name (key event name), AO (adverse outcome), AO title, KER (key event relationship), KER ID and title of the organ system. To achieve this result, three SPARQLqueries will be executed against AOP-Wiki RDF to extract the AOPs that are related to inflammatory processes along with their respective upstreamKEs, downstreamKEs and KERs. The detailed steps are outlined in the following eight sections:

Section 1: System preparation for generation of AOP-Wiki RDF SPARQL queries
Section 2: Execution of the first SPARQL query
Section 3: Merging of two datasets
Section 4: Filtering of results
Section 5: Execution of second SPARQL query
Section 6: Execution of third SPARQL query
Section 7: Merging results from section 4-6
Section 8: Metadata

Section 1: System preparation for generation of AOP-Wiki RDF SPARQL queries

In section 1 and section 2, the system requirements for this notebook will be fulfilled followed by the generation of the first SPARQL query. This query is run against AOP-Wiki RDF.

Step 1: Install SPARQLWRAPPER which is the Python wrapper you need to be able to run the query.

pip install sparqlwrapper

Requirement already satisfied: sparqlwrapper in c:\users\shaki\anaconda3\lib\site-packages (2.0.0)
Requirement already satisfied: rdflib>=6.1.1 in c:\users\shaki\anaconda3\lib\site-packages (from sparqlwrapper) (7.0.0)
Requirement already satisfied: isodate<0.7.0,>=0.6.0 in c:\users\shaki\anaconda3\lib\site-packages (from rdflib>=6.1.1->sparqlwrapper) (0.6.1)
Requirement already satisfied: pyparsing<4,>=2.1.0 in c:\users\shaki\anaconda3\lib\site-packages (from rdflib>=6.1.1->sparqlwrapper) (3.0.9)
Requirement already satisfied: six in c:\users\shaki\anaconda3\lib\site-packages (from isodate<0.7.0,>=0.6.0->rdflib>=6.1.1->sparqlwrapper) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

Step 2: Import sys, sparqlwrapper and pandas which are packages that allow you to interact with variables and functions and manipulate data. For the usage of Pandas, the maximum column width is set to ´None´ as Pandas version 2.2.2 does not allow for non-negative integer to be set.

import sys

!{sys.executable} -m pip install watermark
from SPARQLWrapper import SPARQLWrapper, JSON


import pandas as pd

pd.set_option('display.max_colwidth', None)

Requirement already satisfied: watermark in c:\users\shaki\anaconda3\lib\site-packages (2.4.3)
Requirement already satisfied: ipython>=6.0 in c:\users\shaki\anaconda3\lib\site-packages (from watermark) (8.25.0)
Requirement already satisfied: importlib-metadata>=1.4 in c:\users\shaki\anaconda3\lib\site-packages (from watermark) (7.0.1)
Requirement already satisfied: setuptools in c:\users\shaki\anaconda3\lib\site-packages (from watermark) (69.5.1)
Requirement already satisfied: zipp>=0.5 in c:\users\shaki\anaconda3\lib\site-packages (from importlib-metadata>=1.4->watermark) (3.17.0)
Requirement already satisfied: decorator in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (5.1.1)
Requirement already satisfied: jedi>=0.16 in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (0.18.1)
Requirement already satisfied: matplotlib-inline in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (0.1.6)
Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (3.0.43)
Requirement already satisfied: pygments>=2.4.0 in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (2.15.1)
Requirement already satisfied: stack-data in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (0.2.0)
Requirement already satisfied: traitlets>=5.13.0 in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (5.14.3)
Requirement already satisfied: colorama in c:\users\shaki\anaconda3\lib\site-packages (from ipython>=6.0->watermark) (0.4.6)
Requirement already satisfied: wcwidth in c:\users\shaki\anaconda3\lib\site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython>=6.0->watermark) (0.2.5)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in c:\users\shaki\anaconda3\lib\site-packages (from jedi>=0.16->ipython>=6.0->watermark) (0.8.3)
Requirement already satisfied: executing in c:\users\shaki\anaconda3\lib\site-packages (from stack-data->ipython>=6.0->watermark) (0.8.3)
Requirement already satisfied: asttokens in c:\users\shaki\anaconda3\lib\site-packages (from stack-data->ipython>=6.0->watermark) (2.0.5)
Requirement already satisfied: pure-eval in c:\users\shaki\anaconda3\lib\site-packages (from stack-data->ipython>=6.0->watermark) (0.2.2)
Requirement already satisfied: six in c:\users\shaki\anaconda3\lib\site-packages (from asttokens->stack-data->ipython>=6.0->watermark) (1.16.0)

Step 3: Create the variable: AOPWikiSPARQL for the SPARQL wrapper and set the endpoint to JSON to ensure results are displayed in human-readable format.

AOPWikiSPARQL = SPARQLWrapper("https://aopwiki.rdf.bigcat-bioinformatics.org/sparql/")
AOPWikiSPARQL.setReturnFormat(JSON)

Step 4: Define the triple, coretypes, ontologies and identifiers which are used to codify the semantic data in AOP-Wiki.

triple = ['subject','predicate','object']
coretypes = ['aopo:AdverseOutcomePathway','aopo:KeyEvent','aopo:KeyEventRelationship','ncbitaxon:131567','go:0008150','pato:0001241','pato:0000001','aopo:CellTypeContext','aopo:OrganContext','nci:C54571','cheminf:000000'] 
ontologies = ['http://aopkb.org/aop_ontology#','http://edamontology.org/','http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#','http://purl.bioontology.org/ontology/NCBITAXON/','http://purl.obolibrary.org/obo/MMO','http://purl.obolibrary.org/obo/CL_','http://purl.obolibrary.org/obo/UBERON_','http://purl.obolibrary.org/obo/MI_','http://purl.obolibrary.org/obo/MP_','http://purl.org/commons/record/mesh/','http://purl.obolibrary.org/obo/HP_','http://purl.obolibrary.org/obo/PCO_','http://purl.obolibrary.org/obo/NBO_','http://purl.obolibrary.org/obo/VT_','http://purl.obolibrary.org/obo/PR_','http://purl.obolibrary.org/obo/CHEBI_','http://purl.org/sig/ont/fma/fma','http://xmlns.com/foaf/0.1/','http://www.w3.org/2004/02/skos/core#','http://www.w3.org/2000/01/rdf-schema#','http://www.w3.org/1999/02/22-rdf-syntax-ns#','http://semanticscience.org/resource/CHEMINF_','http://purl.obolibrary.org/obo/GO_','http://purl.org/dc/terms/','http://purl.org/dc/elements/1.1/','http://purl.obolibrary.org/obo/PATO_']
identifiers = []

Section 2: Execution of the first SPARQL query

Step 5: Define and run the SPARQL query to extract the inflammatory-related AOPs. Subsequently, convert the results to a pandas dataframe (df1) to display the results.

first_sparqlquery= '''SELECT DISTINCT ?AOP ?AOPtitle ?KE ?KEname
WHERE {
?KE a aopo:KeyEvent ;
dc:identifier ?KElookup ;
dc:title ?KEname .
?AOP a aopo:AdverseOutcomePathway ;
aopo:has_key_event ?KElookup ;
dc:title ?AOPtitle .

FILTER regex(?KEname, "inflammation|inflammatory", "i")
}
ORDER BY DESC(?AOP)'''

AOPWikiSPARQL.setQuery(first_sparqlquery)
first_results = AOPWikiSPARQL.query().convert()

first_data= first_results["results"]["bindings"]

columns = [{
    "AOP": item["AOP"]["value"],
    "AOPtitle": item["AOPtitle"]["value"],
    "KE": item["KE"]["value"],
    "KEname": item["KEname"]["value"]
    
} for item in first_data]

df1= pd.DataFrame(columns)
display(df1)

	AOP	AOPtitle	KE	KEname
0	https://identifiers.org/aop/62	AKT2 activation leading to hepatic steatosis	https://identifiers.org/aop.events/486	systemic inflammation leading to hepatic steatosis
1	https://identifiers.org/aop/544	Inhibition of neuropathy target esterase leading to delayed neuropathy via increased inflammation	https://identifiers.org/aop.events/149	Increase, Inflammation
2	https://identifiers.org/aop/535	Binding and activation of GPER leading to learning and memory impairments	https://identifiers.org/aop.events/188	Neuroinflammation
3	https://identifiers.org/aop/511	The AOP framework on ROS-mediated oxidative stress induced vascular disrupting effects	https://identifiers.org/aop.events/2009	Activation of inflammation pathway
4	https://identifiers.org/aop/507	Nrf2 inhibition leading to vascular disrupting effects via inflammation pathway	https://identifiers.org/aop.events/2009	Activation of inflammation pathway
...	...	...	...	...
67	https://identifiers.org/aop/144	Endocytic lysosomal uptake leading to liver fibrosis	https://identifiers.org/aop.events/1493	Increased Pro-inflammatory mediators
68	https://identifiers.org/aop/14	Glucocorticoid Receptor Activation Leading to Increased Disease Susceptibility	https://identifiers.org/aop.events/152	Suppression, Inflammatory cytokines
69	https://identifiers.org/aop/12	Chronic binding of antagonist to N-methyl-D-aspartate receptors (NMDARs) during brain development leads to neurodegeneration with impairment in learning and memory in aging	https://identifiers.org/aop.events/188	Neuroinflammation
70	https://identifiers.org/aop/115	Epithelial cytotoxicity leading to forestomach tumors (in mouse and rat)	https://identifiers.org/aop.events/149	Increase, Inflammation
71	https://identifiers.org/aop/114	HPPD inhibition leading to corneal papillomas and carcinomas (in rat)	https://identifiers.org/aop.events/777	Increase, Inflammation (corneal cells)

72 rows × 4 columns

Section 3: Merging of datasets

In this section, you will merge two datafiles: df1 which contains the previous SPARQL query results for inflammatory processes-related AOPs and table2 which is a snorql-csv file that contains additional AOPs that was created by Marvin Martens (supervisor of Shakira Agata). The snorql-csv file has five columns: AOP, AOPName, ao, aotitle and organ which were merged with df1.

Step 6: You first read and convert the snorql-csv file into JSON format.

table2= pd.read_excel('snorql-csv-1679052766.894.xlsx', sheet_name='snorql-csv-1679052766.894')
json_table2= table2.to_json()

Step 7: This is followed by converting the JSON formatted file into a pandas dataframe and verifying the result.

df2= pd.read_json("C:/Users/shaki/Downloads/csvjson.json")

Step 8: Next, you merge the two dataframes based on the shared ‘AOP’ column.

mergedJSONtable= pd.merge(df1,df2, on='AOP')

Section 4: Filtering of results

In this section, you filter the result of section 3 by removing AOPs that are not organ-system based i.e belonging to brain, kidney, liver or lung.

Step 9: The rows in column:‘organ’ that contain ‘other’ are filtered so that you only retain AOPs in the defined four organ systems (brain, liver, kidney and lung). This is done by using boolean function:’!=’ to only include the rows where organ is NOT defined as ‘Other’.

filtered_mergedJSONtable = mergedJSONtable[mergedJSONtable['organ'] != 'Other']

Section 5: Execution of second SPARQL query

In this section, you will run the second SPARQL query to retrieve/define the upstreamKEs and downstreamKEs from the AOPs retrieved in section 2.

Step 10: In preparation for the second SPARQL query, you first retrieve the uri’s for the AOPs so that the execution of the query takes less time and can be automated. This will be done with the following function where we select the ‘AOP’ column from the filtered_mergedJSONtable that contains the uri’s in the filtered_mergedJSONtable and join them (all 54).

values_AOPs = " ".join(f"<{AOP}>" for AOP in filtered_mergedJSONtable['AOP'])

print(values_AOPs)

<https://identifiers.org/aop/62> <https://identifiers.org/aop/48> <https://identifiers.org/aop/452> <https://identifiers.org/aop/452> <https://identifiers.org/aop/451> <https://identifiers.org/aop/451> <https://identifiers.org/aop/447> <https://identifiers.org/aop/429> <https://identifiers.org/aop/409> <https://identifiers.org/aop/409> <https://identifiers.org/aop/382> <https://identifiers.org/aop/38> <https://identifiers.org/aop/374> <https://identifiers.org/aop/362> <https://identifiers.org/aop/320> <https://identifiers.org/aop/320> <https://identifiers.org/aop/319> <https://identifiers.org/aop/303> <https://identifiers.org/aop/3> <https://identifiers.org/aop/280> <https://identifiers.org/aop/278> <https://identifiers.org/aop/27> <https://identifiers.org/aop/206> <https://identifiers.org/aop/173> <https://identifiers.org/aop/173> <https://identifiers.org/aop/171> <https://identifiers.org/aop/17> <https://identifiers.org/aop/17> <https://identifiers.org/aop/144> <https://identifiers.org/aop/12>

Step 11: Now you run the second SPARQL query to retrieve the KE, KEID and KEtitle for the respective AOPs. To do this, you need to use one curly bracket for values_AOPs as that marks the position for uri insertion and two surrounding curly brackets for values_AOPs to prevent identification of the curly brackets as placeholders (escaping).

second_sparqlsquery = f'''
SELECT DISTINCT ?AOP ?KE ?KEID ?KEtitle
WHERE {{
VALUES ?AOP {{ {values_AOPs} }}
?AOP a aopo:AdverseOutcomePathway ;
dc:title ?AOPName ;
aopo:has_key_event ?KE .
?KE a aopo:KeyEvent ; rdfs:label ?KEID ; dc:title ?KEtitle .
}}
'''

AOPWikiSPARQL.setQuery(second_sparqlsquery)
second_results = AOPWikiSPARQL.query().convert()

second_data= second_results["results"]["bindings"]

columns = [{
    
    "AOP": item["AOP"]["value"],
    "KE": item["KE"]["value"],
    "KEID": item["KEID"]["value"],
    "KEtitle": item["KEtitle"]["value"] 
    
} for item in second_data]

df3= pd.DataFrame(columns)
display(df3)

	AOP	KE	KEID	KEtitle
0	https://identifiers.org/aop/447	https://identifiers.org/aop.events/105	KE 105	Inhibition, Mitochondrial Electron Transport Chain Complexes
1	https://identifiers.org/aop/27	https://identifiers.org/aop.events/1115	KE 1115	Increase, Reactive oxygen species
2	https://identifiers.org/aop/303	https://identifiers.org/aop.events/1115	KE 1115	Increase, Reactive oxygen species
3	https://identifiers.org/aop/319	https://identifiers.org/aop.events/1115	KE 1115	Increase, Reactive oxygen species
4	https://identifiers.org/aop/382	https://identifiers.org/aop.events/1115	KE 1115	Increase, Reactive oxygen species
...	...	...	...	...
185	https://identifiers.org/aop/62	https://identifiers.org/aop.events/459	KE 459	Increased, Liver Steatosis
186	https://identifiers.org/aop/62	https://identifiers.org/aop.events/484	KE 484	Activation, AKT2
187	https://identifiers.org/aop/62	https://identifiers.org/aop.events/486	KE 486	systemic inflammation leading to hepatic steatosis
188	https://identifiers.org/aop/447	https://identifiers.org/aop.events/759	KE 759	Increased, Kidney Failure
189	https://identifiers.org/aop/451	https://identifiers.org/aop.events/780	KE 780	Increase, Cytotoxicity (epithelial cells)

190 rows × 4 columns

Section 6: Execution of third SPARQL query

In this section, you will run the third SPARQL query to retrieve/define the KERs for the upstreamKEs and downstreamKEs of the AOPs. This approach allows for easier integration of the output into the nodetable and edgetable of the AOP network (1).

Step 12: You run the following query to retrieve the upstreamKEs, downstreamKEs and KERs.

third_sparqlsquery = f'''
SELECT DISTINCT ?AOP ?upstreamKE ?upstreamKEtitle ?downstreamKE ?downstreamKEtitle ?KER ?KERID
WHERE {{
VALUES ?AOP {{ {values_AOPs} }}
?AOP a aopo:AdverseOutcomePathway ;
dc:title ?AOPName ;
aopo:has_key_event_relationship ?KER .
?KER a aopo:KeyEventRelationship ;
rdfs:label ?KERID .

?KER aopo:has_upstream_key_event ?upstreamKE .
?upstreamKE dc:title ?upstreamKEtitle .

?KER aopo:has_downstream_key_event ?downstreamKE .
?downstreamKE dc:title ?downstreamKEtitle .
}}
'''
AOPWikiSPARQL.setQuery(third_sparqlsquery)
third_results = AOPWikiSPARQL.query().convert()

third_data= third_results["results"]["bindings"]

columns = [{
    "AOP": item["AOP"]["value"],
    "upstreamKE": item["upstreamKE"]["value"],
    "downstreamKE": item["downstreamKE"]["value"],
    "KER": item["KER"]["value"],
    "KERID": item["KERID"]["value"],
    "upstreamKEtitle": item["upstreamKEtitle"]["value"],
    "downstreamKEtitle": item["downstreamKEtitle"]["value"]
    
    
} for item in third_data]

df4= pd.DataFrame(columns)

Section 7: Merging results from section 4-6

In this section, you will merge the dataframes: filtered_mergedJSONtable (result of section 4), df3 (result of section 5) and df4 (result of section 6).

Step 13: The three dataframes are merged to get the final table that will be used to define the nodes and edges of the network in Py4Cytoscape. The final table will be the merged version of:

filtered_mergedJSONtable (AOP SPARQL query)
df3 (KE SPARQL query)
df4(KER/KERID SPARQL query).

first_merge = pd.merge(df3, df4)

final_result= pd.merge(filtered_mergedJSONtable,first_merge)

Step 14: Lastly the final_result table is stored in JSON-format in your laptop in preparation for the next Jupyter Notebook.

final_result.to_json('final_result.json')

Section 8: Metadata

Step 15: At last, the metadata belonging to this Jupyter Notebook is displayed which contains the version numbers of packages and system-set-up for interested users. This requires the usage of packages:Watermark and print_versions.

%load_ext watermark
!pip install print-versions

Requirement already satisfied: print-versions in c:\users\shaki\anaconda3\lib\site-packages (0.1.0)

%watermark

Last updated: 2025-04-26T20:54:28.526670+02:00

Python implementation: CPython
Python version       : 3.12.3
IPython version      : 8.25.0

Compiler    : MSC v.1938 64 bit (AMD64)
OS          : Windows
Release     : 11
Machine     : AMD64
Processor   : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
CPU cores   : 8
Architecture: 64bit

from print_versions import print_versions
print_versions(globals())

json==2.0.9
ipykernel==6.28.0
pandas==2.2.2
SPARQLWrapper==2.0.0

References:

Martens, M., Evelo, C. T., & Willighagen, E. L. (2022). Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. Applied In Vitro Toxicology. https://github.com/marvinm2/AOPWikiRDF
Martens M, Evelo CT, Willighagen EL. Providing Adverse Outcome Pathways from the AOP-Wiki in a Semantic Web Format to Increase Usability and Accessibility of the Content. Appl In Vitro Toxicol. 2022 Mar 1;8(1):2-13. doi: 10.1089/aivt.2021.0010. Epub 2022 Mar 17. PMID: 35388368; PMCID: PMC8978481.
msx. Package for listing version of packages used in a Jupyter Notebook \[Internet\]. Stack Overflow. 2016. Available from: https://stackoverflow.com/questions/40428931/package-for-listing-version-of-packages-used-in-a-jupyter-notebook