Part 5: Visualization of transcriptomics expression datasets in the enriched AOP network part 2

The AOP project ► Key objective 2

Author: Shakira Agata

This Jupyter notebook describes the steps needed for the mapping of transcriptomics datasets in the constructed enriched AOP network. For this notebook, open license transcriptomics datasets were derived from ArrayExpress and Gene Expression Omnibus (GEO). These datasets were preprocessed followed by execution of statistical analysis to identify differential expression genes (DEG). The tabulation of differential gene expression data was subsequently mapped/integrated into the network. This notebook is subdivided into the following six sections:

Section 1: System preparation
Section 2: Retrieval of the enriched AOP network
Section 3: Adaptation of gene node color within the enriched AOP network
Section 4: Mapping of dataset: E-GEOD-69851
- Section 4.1 Bisphenol A
- Section 4.2 Farnesol
- Section 4.3 Tetrachlorodibenzo p-dioxin
- Section 4.4 Troglitazone
- Section 4.5 Valproic acid
Section 5: Mapping of dataset: E-GEOD-69851
- Section 5.1 ACR exposure time 1
- Section 5.2 ACR exposure time 2
- Section 5.3 MA exposure time 1
- Section 5.4 MA exposure time 2
- Section 5.5 CP exposure time 1
- Section 5.6 CP exposure time 2
Section 6: Metadata

Section 1: System preparation

In this section, you will import the required packages and tools you need for this Jupyternotebook.

step 1: You imported Pandas, Py4cytoscape and style mapping functions of Py4cytoscape.

import pandas as pd
import glob

import py4cytoscape as p4c
p4c.cytoscape_ping()
p4c.cytoscape_version_info()

You are connected to Cytoscape!





{'apiVersion': 'v1',
 'cytoscapeVersion': '3.10.1',
 'automationAPIVersion': '1.9.0',
 'py4cytoscapeVersion': '1.9.0'}

from py4cytoscape import get_node_color
from py4cytoscape import set_node_color_mapping
from py4cytoscape import gen_node_color_map
from py4cytoscape import set_edge_color_default
from py4cytoscape import set_node_color_default
from py4cytoscape import set_edge_source_arrow_shape_default
from py4cytoscape import set_edge_target_arrow_shape_default
from py4cytoscape import get_arrow_shapes
from py4cytoscape import get_edge_target_arrow_shape
from py4cytoscape import set_edge_target_arrow_shape_mapping
from py4cytoscape import gen_edge_arrow_map
from py4cytoscape import select_nodes
from py4cytoscape import get_table_value
from py4cytoscape import get_network_suid
from py4cytoscape import clear_selection
from py4cytoscape import set_node_color_bypass
from py4cytoscape import set_edge_color_bypass
from py4cytoscape import set_edge_target_arrow_color_default
from py4cytoscape import set_node_size_bypass
from py4cytoscape import create_subnetwork

In this section, you will change the node color of genes and adapt the style for easier intepretation of the upcoming results. This is needed in preparation for the mapping of transcriptomics datasets. These datasets may contain genes that are not present in the build AOP network and so therefore should receive a distinct color to correctly inform user.

step 2: You open the session you saved in the previous Jupyternotebook.

p4c.open_session('Agata,S.-Part4-Complete Molecular inflammation-process related AOP network.cys')

Opening C:\Users\shaki\Downloads\Agata,S.-Part4-Complete Molecular inflammation-process related AOP network.cys...





{}

Section 3: Adaptation of gene node color within the enriched AOP network

step 3: You can change the style with the following commands.

style_name = "default"
defaults = {'NODE_SHAPE': "ELLIPSE", 'NODE_SIZE': 20, 'EDGE_TRANSPARENCY': 140, 'NODE_LABEL_POSITION': "C,C,c,0.00,0.00"}
nodeLabels = p4c.map_visual_property('node label', 'name', 'p') 
edgeWidth = p4c.map_visual_property('edge width', 'weight', 'p') 
arrowShapes = p4c.map_visual_property('Edge Target Arrow Shape','interaction', 'd')
p4c.create_visual_style(style_name, defaults, [nodeLabels, edgeWidth])
p4c.set_visual_style(style_name)

{'message': 'Visual Style applied.'}

set_node_color_default('#a7a5a5',style_name='default')
set_edge_color_default('#01e735', style_name='default')

''

p4c.clone_network()
p4c.rename_network('clone-GSE69844')

{'network': 235771, 'title': 'clone-GSE69844'}

Section 4: Mapping of dataset:GSE69844

In this section, you will map the transcriptomics expression data of dataset:GSE69844.This will be done in similar fashion as in previous section, but will be streamlined due to high number of datafiles. In preparation for this section, you must first download the datafiles of the chemicals into separate folders.

4.1 Bisphenol A

4.1.1 Concentration 1

step 4: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

BisphenolA_concentration1= pd.read_csv('GSE69844.BisphenolA-1uM.tsv',sep='\t')
Adjusted_BisphenolA_concentration1= BisphenolA_concentration1[BisphenolA_concentration1['padj'] < 0.05]
BisphenolA_concentration_1=Adjusted_BisphenolA_concentration1.drop('ID', axis=1)
BisphenolA_Concentration1 = BisphenolA_concentration_1[['Entrez.Gene','Gene.Symbol','padj', 't', 'B','logFC','GB_LIST','SPOT_ID']]
BisphenolA_Concentration1.to_excel('GSE69844.BisphenolA-concentration1.xlsx',index=False)
BisphenolA_Concentration1

	Entrez.Gene	Gene.Symbol	padj	t	B	logFC	GB_LIST	SPOT_ID
0	28996	HIPK2	0.000026	-12.821942	12.209372	-0.593550	NM_001113239,NM_022740,XM_001716827,XM_925800	NaN
1	87	ACTN1	0.000026	-12.667672	12.062264	-0.585774	NM_001102,NM_001130004,NM_001130005	NaN
2	1455	CSNK1G2	0.000029	-12.228128	11.631158	-0.489035	NM_001319	NaN
3	2316	FLNA	0.000038	-11.657262	11.043619	-0.444861	NM_001110556,NM_001456	NaN
4	23524	SRRM2	0.000038	-11.409780	10.778745	-0.554962	NM_016333	NaN
...	...	...	...	...	...	...	...	...
3472	7485	WRB	0.049943	3.424377	-1.958139	0.136566	NM_001146218,NM_004627	NaN
3473	81607	PVRL4	0.049943	-3.424351	-1.958192	-0.152348	NM_030916	NaN
3474	91057	CCDC34	0.049943	3.424349	-1.958195	0.170781	NM_030771,NM_080654	NaN
3475	79780	CCDC82	0.049976	3.423893	-1.959112	0.138349	NM_024725	NaN
3476	2108	ETFA	0.049982	3.423700	-1.959501	0.142652	NM_000126,NM_001127716	NaN

3477 rows × 8 columns

**step 5: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.BisphenolA-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID',network='Agata,S.-Part4-Molecular inflammation-process related AOP network')

{'mappedTables': [388893, 388931]}

step 6: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

**step 7: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 8: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

3.1.2 Concentration 2

step 9: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

BisphenolA_concentration2= pd.read_csv('GSE69844.BisphenolAConcentration2.tsv',sep='\t')
Adjusted_BisphenolA_concentration2= BisphenolA_concentration2[BisphenolA_concentration2['adj.P.Val'] < 0.05]
BisphenolA_concentration_2=Adjusted_BisphenolA_concentration2.drop('ID', axis=1)
BisphenolA_Concentration2 = BisphenolA_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
BisphenolA_Concentration2.to_excel('GSE69844.BisphenolA-concentration2.xlsx',index=False)
BisphenolA_Concentration2

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	COL8A1	0.0172	-9.304198	3.89521	-0.381	NM_001850,NM_020351	NaN
1	S100A2	0.0414	8.109804	3.14555	0.330	NM_005978	NaN

step 10: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.BisphenolA-concentration2.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID')

{'mappedTables': [388893, 388931]}

step 11: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 12: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 13: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

3.1.3 Concentration 3

step 14: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

BisphenolA_concentration3= pd.read_csv('GSE69844.BisphenolAConcentration3.tsv',sep='\t')
Adjusted_BisphenolA_concentration3= BisphenolA_concentration3[BisphenolA_concentration3['adj.P.Val'] < 0.05]
BisphenolA_concentration_3=Adjusted_BisphenolA_concentration3.drop('ID', axis=1)
BisphenolA_Concentration3 = BisphenolA_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
BisphenolA_Concentration3.to_excel('GSE69844.BisphenolA-concentration3.xlsx',index=False)
BisphenolA_Concentration3

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	COL8A1	0.000002	-15.333380	15.028626	-0.649	NM_001850,NM_020351	NaN
1	ZBTB16	0.000003	-14.358816	14.190854	-0.754	NM_001018011,NM_006006	NaN
2	PDE4DIP	0.000003	-14.169566	14.020061	-0.640	NM_001002810,NM_001002811,NM_001002812,NM_0146...	NaN
3	GGT5	0.000003	-13.645523	13.532532	-0.596	NM_001099781,NM_001099782,NM_004121	NaN
4	ZFP36L2	0.000003	-13.583284	13.473166	-0.520	NM_006887	NaN
...	...	...	...	...	...	...	...
5726	PSAT1	0.049904	3.176887	-2.554956	0.136	NM_021154,NM_058179	NaN
5727	PXN	0.049921	3.176598	-2.555540	0.170	NM_001080855,NM_002859,NM_025157	NaN
5728	SET	0.049921	-3.176562	-2.555612	-0.125	NM_001122821,NM_003011	NaN
5729	SIGMAR1	0.049943	-3.176276	-2.556190	-0.153	NM_005866,NM_147157	NaN
5730	SEMA3B	0.049965	-3.175985	-2.556778	-0.133	NM_001005914,NM_004636	NaN

5731 rows × 7 columns

step 15: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.BisphenolA-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

step 16: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 17: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 18: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

4.2 Farnesol

3.2.1 Concentration 1

step 19: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 1 of Farnesol does not yield significant genes and thus can’t be mapped in the AOP network.

Farnesol_concentration1= pd.read_csv('GSE69844.FarnesolConcentration1.tsv',sep='\t')
Adjusted_Farnesol_concentration1= Farnesol_concentration1[Farnesol_concentration1['adj.P.Val'] < 0.05]
Farnesol_concentration_1=Adjusted_Farnesol_concentration1.drop('ID', axis=1)
Farnesol_Concentration1 = Farnesol_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Farnesol_Concentration1.to_excel('GSE69844.Farnesol-concentration1.xlsx',index=False)
Farnesol_Concentration1

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID

3.2.2 Concentration 2

step 20: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 2 of Farnesol does not yield significant genes and thus can’t be mapped in the AOP network.

Farnesol_concentration2= pd.read_csv('GSE69844.FarnesolConcentration2.tsv',sep='\t')
Adjusted_Farnesol_concentration2= Farnesol_concentration2[Farnesol_concentration2['adj.P.Val'] < 0.05]
Farnesol_concentration_2=Adjusted_Farnesol_concentration2.drop('ID', axis=1)
Farnesol_Concentration2 = Farnesol_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Farnesol_Concentration2.to_excel('GSE69844.Farnesol-concentration2.xlsx',index=False)
Farnesol_Concentration2

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID

3.2.3 Concentration 3

step 21: You first import the expression table into a new dataframe.

Farnesol_concentration3= pd.read_csv('GSE69844.FarnesolConcentration3.tsv',sep='\t')
Adjusted_Farnesol_concentration3= Farnesol_concentration3[Farnesol_concentration3['adj.P.Val'] < 0.05]
Farnesol_concentration_3=Adjusted_Farnesol_concentration3.drop('ID', axis=1)
Farnesol_Concentration3 = Farnesol_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Farnesol_Concentration3.to_excel('GSE69844.Farnesol-concentration3.xlsx',index=False)
Farnesol_Concentration3

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	IGFBP1	0.00101	13.437237	6.97653	0.709	NM_000596	NaN
1	CBX5	0.01410	-9.847783	5.16687	-0.627	NM_001127321,NM_001127322,NM_012117	NaN
2	---	0.01543	-9.388084	4.85421	-0.569	NaN	--AFFX-HUMRGE/M10098_3
3	ANGPTL4	0.03269	8.326084	4.03872	0.701	NM_001039667,NM_139314	NaN
4	FNDC3B	0.03269	-8.294095	4.01196	-0.439	NM_001135095,NM_022763	NaN

step 22: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Farnesol-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

step 23: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

Log2Foldchange_column

	logFC
389124	NaN
401415	NaN
393220	NaN
401410	0.153
389120	NaN
...	...
401405	0.158
389112	NaN
401400	NaN
393210	0.370
397305	NaN

2952 rows × 1 columns

step 24: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 25: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

4.4 Tetrachlorodibenzo p-dioxin

3.4.1 Concentration 1

step 26: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Tpdioxin_concentration1= pd.read_csv('GSE69844.TpdioxinConcentration1.tsv',sep='\t')
Adjusted_Tpdioxin_concentration1= Tpdioxin_concentration1[Tpdioxin_concentration1['adj.P.Val'] < 0.05]
Tpdioxin_concentration_1=Adjusted_Tpdioxin_concentration1.drop('ID', axis=1)
Tpdioxin_Concentration1 = Tpdioxin_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Tpdioxin_Concentration1.to_excel('GSE69844.Tpdioxin-concentration1.xlsx',index=False)
Tpdioxin_Concentration1

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	CYP1B1	0.000628	54.309631	3.33321	3.630	NM_000104	NaN
1	CYP1B1	0.002197	37.892762	3.22492	3.550	NM_000104	NaN
2	CYP1B1	0.002271	34.935858	3.18861	3.270	NM_000104	NaN
3	IER3	0.003985	29.839182	3.10077	1.430	NM_003897	NaN
4	CYP1A1	0.004145	27.446373	3.04317	3.280	NM_000499	NaN
5	CYP1B1	0.004145	27.033615	3.03179	3.710	NM_000104	NaN
6	CYP1A1	0.004145	26.696606	3.02214	3.190	NM_000499	NaN
7	HSD17B2	0.017668	19.876908	2.72569	0.923	NM_002153	NaN
8	SLC7A11	0.041721	16.244113	2.42797	0.782	NM_014331	NaN
9	GDF15 /// LOC100292463	0.041721	16.141041	2.41710	0.890	NM_004864,XM_002345162	NaN
10	TIPARP	0.041721	15.940321	2.39544	0.708	NM_001184717,NM_001184718,NM_015508	NaN

step 27: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Tpdioxin-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

step 28: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 29: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 30: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

3.4.2 Concentration 2

step 31: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Tpdioxin_concentration2= pd.read_csv('GSE69844.TpdioxinConcentration2.tsv',sep='\t')
Adjusted_Tpdioxin_concentration2= Tpdioxin_concentration2[Tpdioxin_concentration2['adj.P.Val'] < 0.05]
Tpdioxin_concentration_2=Adjusted_Tpdioxin_concentration2.drop('ID', axis=1)
Tpdioxin_Concentration2 = Tpdioxin_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Tpdioxin_Concentration2.to_excel('GSE69844.Tpdioxin-concentration2.xlsx',index=False)
Tpdioxin_Concentration2

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	CYP1B1	0.00381	34.144660	-1.85	3.223325	NM_000104	NaN
1	CYP1B1	0.00381	32.206372	-1.86	3.642847	NM_000104	NaN
2	CYP1B1	0.00978	25.244475	-1.87	3.359260	NM_000104	NaN

step 32: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Tpdioxin-concentration2.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

step 33: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 34: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 35: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

3.4.3 Concentration 3

step 36: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Tpdioxin_concentration3= pd.read_csv('GSE69844.TpdioxinConcentration3.tsv',sep='\t')
Adjusted_Tpdioxin_concentration3= Tpdioxin_concentration3[Tpdioxin_concentration3['adj.P.Val'] < 0.05]
Tpdioxin_concentration_3=Adjusted_Tpdioxin_concentration3.drop('ID', axis=1)
Tpdioxin_Concentration3 = Tpdioxin_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Tpdioxin_Concentration3.to_excel('GSE69844.Tpdioxin-concentration3.xlsx',index=False)
Tpdioxin_Concentration3

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	CYP1B1	0.000035	66.355926	7.262273	3.951611	NM_000104	NaN
1	CYP1B1	0.000050	55.772416	7.162770	3.734094	NM_000104	NaN
2	CYP1A1	0.000060	50.590763	7.091598	3.380474	NM_000499	NaN
3	CYP1B1	0.000069	47.138520	7.031778	3.502514	NM_000104	NaN
4	SERPINB2	0.000145	40.135573	6.865297	3.556188	NM_001143818,NM_002575	NaN
...	...	...	...	...	...	...	...
145	MPHOSPH6	0.049924	8.406946	1.666651	0.409146	NM_005792	NaN
146	RUNX1	0.049924	8.402542	1.663899	0.420556	NM_001001890,NM_001122607,NM_001754	NaN
147	A1CF	0.049924	-8.400467	1.662601	-0.490591	NM_014576,NM_138932,NM_138933	NaN
148	ABLIM1	0.049924	-8.394971	1.659163	-0.487655	NM_001003407,NM_001003408,NM_002313,NM_006720	NaN
149	GNA13	0.049924	8.386056	1.653580	0.544947	NM_006572	NaN

150 rows × 7 columns

step 37: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Tpdioxin-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

step 38: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 39: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 40: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

4.5 Valproic acid

3.5.1 Concentration 1

step 41: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 1 of Valproic acid does not yield significant genes and thus can’t be mapped in the AOP network.

Valproicacid_concentration1= pd.read_csv('GSE69844.ValproicacidConcentration1.tsv',sep='\t')
Adjusted_Valproicacid_concentration1= Valproicacid_concentration1[Valproicacid_concentration1['adj.P.Val'] < 0.05]
Valproicacid_concentration_1=Adjusted_Valproicacid_concentration1.drop('ID', axis=1)
Valproicacid_Concentration1 = Valproicacid_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Valproicacid_Concentration1.to_excel('GSE69844.Valproicacid-concentration1.xlsx',index=False)
Valproicacid_Concentration1

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	CXorf26	0.00223	-28.40	8.250030	-1.090	NM_016500	NaN
1	ONECUT2	0.00223	26.20	7.967265	1.310	NM_004852	NaN
2	KEAP1	0.00275	-22.30	7.312031	-1.190	NM_012289,NM_203500	NaN
3	ITPRIPL2	0.00275	-22.30	7.302028	-0.990	NM_001034841,NR_028028	NaN
4	TMEM170B	0.00275	22.00	7.241510	1.470	NM_001100829	NaN
...	...	...	...	...	...	...	...
4499	MTF2	0.04987	4.26	-2.075831	0.294	NM_001164391,NM_001164392,NM_001164393,NM_007358	NaN
4500	CDK5RAP1	0.04994	4.26	-2.077623	0.230	NM_016082,NM_016408	NaN
4501	DPF2	0.04994	-4.26	-2.077706	-0.233	NM_006268	NaN
4502	ROR2	0.04998	4.26	-2.078890	0.217	NM_004560	NaN
4503	SLC9A8	0.04998	-4.26	-2.079096	-0.222	NM_015266	NaN

4504 rows × 7 columns

step 42: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Valproicacid-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

step 43: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 44: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 45: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

3.5.2 Concentration 2

step 46: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 2 of Valproic acid does not yield significant genes and thus can’t be mapped in the AOP network.

Valproicacid_concentration2= pd.read_csv('GSE69844.ValproicacidConcentration2.tsv',sep='\t')
Adjusted_Valproicacid_concentration2= Valproicacid_concentration2[Valproicacid_concentration2['adj.P.Val'] < 0.05]
Valproicacid_concentration_2=Adjusted_Valproicacid_concentration2.drop('ID', axis=1)
Valproicacid_Concentration2 = Valproicacid_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Valproicacid_Concentration2.to_excel('GSE69844.Valproicacid-concentration2.xlsx',index=False)
Valproicacid_Concentration2

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID

3.5.3 Concentration 3

step 47: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 3 of Valproic acid does not yield significant genes and thus can’t be mapped in the AOP network.

Valproicacid_concentration3= pd.read_csv('GSE69844.ValproicacidConcentration3.tsv',sep='\t')
Adjusted_Valproicacid_concentration3= Valproicacid_concentration3[Valproicacid_concentration3['adj.P.Val'] < 0.05]
Valproicacid_concentration_3=Adjusted_Valproicacid_concentration3.drop('ID', axis=1)
Valproicacid_Concentration3 = Valproicacid_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Valproicacid_Concentration3.to_excel('GSE69844.Valproicacid-concentration3.xlsx',index=False)
Valproicacid_Concentration3

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID

4.7 Troglitazone

4.7.1 Concentration 1

step 48: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Troglitazone_concentration1= pd.read_csv('GSE69844.TroglitazoneConcentration1.tsv',sep='\t')
Adjusted_Troglitazone_concentration1= Troglitazone_concentration1[Troglitazone_concentration1['adj.P.Val'] < 0.05]
Troglitazone_concentration_1=Adjusted_Troglitazone_concentration1.drop('ID', axis=1)
Troglitazone_Concentration1 = Troglitazone_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Troglitazone_Concentration1.to_excel('GSE69844.Troglitazone-concentration1.xlsx',index=False)
Troglitazone_Concentration1

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	FABP4	2.170000e-07	26.70	14.69249	2.278158	NM_001442	NaN
1	CSNK1G2	1.510000e-04	-14.40	10.27064	-0.585872	NM_001319	NaN
2	ACTN1	2.410000e-04	-13.30	9.59510	-0.580898	NM_001102,NM_001130004,NM_001130005	NaN
3	COL8A1	3.770000e-04	-12.50	9.00330	-0.629488	NM_001850,NM_020351	NaN
4	SRRM2	3.990000e-04	-12.20	8.77591	-0.730824	NM_016333	NaN
...	...	...	...	...	...	...	...
3118	PHF20	4.980000e-02	-3.68	-1.74472	-0.165499	NM_016436	NaN
3119	CTGF	4.980000e-02	-3.68	-1.74515	-0.219036	NM_001901	NaN
3120	ADNP2	4.980000e-02	-3.68	-1.74565	-0.189624	NM_014913	NaN
3121	ATP2A2	4.990000e-02	-3.68	-1.74633	-0.165539	NM_001135765,NM_001681,NM_170665	NaN
3122	C1orf198	4.990000e-02	-3.68	-1.74677	-0.194524	NM_001136494,NM_001136495,NM_032800	NaN

3123 rows × 7 columns

step 49: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Troglitazone-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

step 50: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 51: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 52: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

3.7.2 Concentration 2

step 53: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Troglitazone_concentration2= pd.read_csv('GSE69844.TroglitazoneConcentration2.tsv',sep='\t')
Adjusted_Troglitazone_concentration2= Troglitazone_concentration2[Troglitazone_concentration2['adj.P.Val'] < 0.05]
Troglitazone_concentration_2=Adjusted_Troglitazone_concentration2.drop('ID', axis=1)
Troglitazone_Concentration2 = Troglitazone_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Troglitazone_Concentration2.to_excel('GSE69844.Troglitazone-concentration2.xlsx',index=False)
Troglitazone_Concentration2

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	FABP4	1.000000e-12	57.329825	13.74886	3.351827	NM_001442	NaN
1	PLIN4	2.340000e-05	14.969008	9.66342	0.709311	NM_001080400	NaN
2	ATP2B4	5.990000e-05	13.451635	8.95408	0.550716	NM_001001396,NM_001684	NaN
3	PDK4	5.990000e-05	13.146994	8.79403	1.124497	NM_002612	NaN
4	DLC1	9.970000e-05	12.396132	8.37088	0.527070	NM_001164271,NM_006094,NM_024767,NM_182643	NaN
...	...	...	...	...	...	...	...
69	TXNIP	4.420000e-02	5.718843	1.90694	0.311458	NM_006472	NaN
70	PHLDA3	4.420000e-02	5.712639	1.89787	0.227240	NM_012396	NaN
71	ATP1B1	4.600000e-02	-5.680871	1.85131	-0.231002	NM_001001787,NM_001677	NaN
72	ANKRD1	4.690000e-02	-5.661073	1.82221	-0.295793	NM_014391	NaN
73	ZBED3	4.920000e-02	5.625298	1.76945	0.301710	NM_032367	NaN

74 rows × 7 columns

step 54: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.geneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Troglitazone-concentration2.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

step 55: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 56: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 57: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

3.7.3 Concentration 3

step 58: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Troglitazone_concentration3= pd.read_csv('GSE69844.TroglitazoneConcentration3.tsv',sep='\t')
Adjusted_Troglitazone_concentration3= Troglitazone_concentration3[Troglitazone_concentration3['adj.P.Val'] < 0.05]
Troglitazone_concentration_3=Adjusted_Troglitazone_concentration3.drop('ID', axis=1)
Troglitazone_Concentration3 = Troglitazone_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Troglitazone_Concentration3.to_excel('GSE69844.Troglitazone-concentration3.xlsx',index=False)
Troglitazone_Concentration3

	Gene.Symbol	adj.P.Val	t	B	logFC	GB_LIST	SPOT_ID
0	FABP4	2.190000e-17	63.189122	33.180151	3.760460	NM_001442	NaN
1	PDK4	8.910000e-12	27.858041	25.280555	1.669789	NM_002612	NaN
2	KLF9	8.910000e-12	-27.795929	25.251639	-1.271744	NM_001206	NaN
3	INSIG1	8.910000e-12	27.633020	25.175329	1.021568	NM_005542,NM_198336,NM_198337	NaN
4	CYP1B1	4.890000e-11	24.684925	23.668933	1.327687	NM_000104	NaN
...	...	...	...	...	...	...	...
9171	RPA2	4.990000e-02	2.923957	-3.330818	0.110598	NM_002946	NaN
9172	INHBE	4.990000e-02	2.923939	-3.330855	0.214330	NM_031479	NaN
9173	C5orf41	4.990000e-02	-2.923657	-3.331433	-0.113331	NM_001168393,NM_001168394,NM_153607	NaN
9174	C7orf68	4.990000e-02	2.923642	-3.331465	0.281327	NM_001098786,NM_013332	NaN
9175	RPL24	4.990000e-02	2.923395	-3.331971	0.112186	NM_000986	NaN

9176 rows × 7 columns

step 59: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.geneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Troglitazone-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

step 60: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 61: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 62: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')

''

Section 5: Mapping of dataset:GSE44729

In this section, you will map the transcriptomics expression data of datasets: GSE44729. This dataset aimed to transcriptonally profile BEAS-2B cells for the comparison between controls and skin sensitizers, controls and respiratory sensitizers and controls and non-sensitizing irritants.

5.1 ACR exposure time 1

step 63: You first import the expression table into a new dataframe.

ACR_10h= pd.read_csv('adaptedACR10h.tsv',sep='\t')
ACR_10h

	ID	COL	ROW	NAME	SPOT_ID	CONTROL_TYPE	REFSEQ	GB_ACC	GENE	GENE_SYMBOL	...	SEQUENCE	SPOT_ID.1	ORDER	logFC	AveExpr	t	P.Value	padj	B	ENSEMBLE_GENE_ID
0	30939	85	3	A_24_P102821	A_24_P102821	False	NM_000952	NM_000952	5724	PTAFR	...	ATACGGTCACTGAAGTGGTTGTGCCATTCAACCAGATCCCTGGCAA...	NaN	30939	0.472665	0.049355	5.812795	2.199154e-07	0.005924	5.278602	ENSG00000169403

1 rows × 29 columns

step 64: Unfortunately, the PTAFR gene is not included in the AOP network and thus can’t be mapped.

p4c.load_table_data_from_file('adaptedACR10h.tsv', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')

{'mappedTables': [388893, 388931]}

5.2 ACR exposure time 2

step 65: You first import the expression table into a new dataframe.

ACR_24h= pd.read_csv('adaptedACR24h.tsv',sep='\t')
ACR_24h

	ID	COL	ROW	NAME	SPOT_ID	CONTROL_TYPE	REFSEQ	GB_ACC	GENE	GENE_SYMBOL	...	SEQUENCE	SPOT_ID.1	ORDER	logFC	AveExpr	t	P.Value	padj	B	ENSEMBLE_GENE_ID
0	42685	15	142	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	42685	1.179039	0.954675	6.285084	3.447616e-08	0.000253	6.360406	ENSG00000100292
1	40823	26	126	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	40823	1.166378	0.956092	6.242149	4.085234e-08	0.000253	6.263043	ENSG00000100292
2	9749	209	54	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	9749	1.173634	0.954391	6.235767	4.189503e-08	0.000253	6.248567	ENSG00000100292
3	29347	94	127	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	29347	1.174612	0.964019	6.234992	4.202344e-08	0.000253	6.246809	ENSG00000100292
4	43478	11	85	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	43478	1.181977	0.956175	6.228517	4.311172e-08	0.000253	6.232119	ENSG00000100292
5	36362	53	37	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	36362	1.180659	0.960191	6.218832	4.479195e-08	0.000253	6.210143	ENSG00000100292
6	4189	242	123	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	4189	1.170264	0.957791	6.176783	5.287237e-08	0.000253	6.114708	ENSG00000100292
7	20842	144	137	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	20842	1.165382	0.963625	6.148413	5.912487e-08	0.000253	6.050293	ENSG00000100292
8	6877	226	18	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	6877	1.166841	0.951196	6.142273	6.057178e-08	0.000253	6.036349	ENSG00000100292
9	16256	171	129	A_23_P120883	A_23_P120883	False	NM_002133	NM_002133	3162	HMOX1	...	TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT...	NaN	16256	1.162878	0.964791	6.137396	6.174589e-08	0.000253	6.025273	ENSG00000100292
10	26200	112	132	A_32_P42684	A_32_P42684	False	NM_014331	NM_014331	23657	SLC7A11	...	TTACTGATACTAAATGTTGGCTACCTGTGATTTTATAGTATGCACA...	NaN	26200	0.752090	0.640196	5.433205	9.494058e-07	0.003561	4.425625	ENSG00000151012
11	21063	143	35	A_23_P212655	A_23_P212655	False	NM_130446	NM_130446	89857	KLHL6	...	TTCTGGTCTCAATGGCTTCGGGAAACACACATATACACATACACCA...	NaN	21063	-0.748734	-0.040346	-4.660551	1.681000e-05	0.031843	2.698548	ENSG00000172578
12	28950	96	72	A_23_P313828	A_23_P313828	False	NM_181716	NM_181716	201161	CENPV	...	TTTGACTGCAATTGCAGCATTTGCAAGAAGAAGCAGAATAGACACT...	NaN	28950	-0.234981	0.018625	-4.523276	2.750745e-05	0.042802	2.398681	ENSG00000166582
13	18739	156	94	A_23_P25487	A_23_P25487	False	NM_018018	NM_018018	55089	SLC38A4	...	TGTTCTGGTCATCCTTGTGCCAACTATAAAATACATCTTCGGATTC...	NaN	18739	0.293797	-0.025377	4.464863	3.385527e-05	0.047625	2.271960	ENSG00000139209
14	21386	141	69	A_23_P163402	A_23_P163402	False	NM_000499	NM_000499	1543	CYP1A1	...	GGTAAAACAGGGCCACATAGATGCTGATGGAGCCTTCCCAAGTTGT...	NaN	21386	-0.234158	0.184573	-4.450278	3.565014e-05	0.047916	2.240408	ENSG00000140465
15	32586	75	109	A_32_P165477	A_32_P165477	False	NM_014331	NM_014331	23657	SLC7A11	...	CATTTTGCTTTCCTAACCATTCAGTCAGGAATTAAAATATGGCATT...	NaN	32586	0.758739	0.723792	4.431768	3.806166e-05	0.048953	2.200414	ENSG00000151012

16 rows × 29 columns

step 66: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

ACR24h= pd.read_csv('adaptedACR24h.tsv',sep='\t')
ACR24h_version1= ACR24h[ACR24h['padj'] < 0.05]
ACR24h_version2=ACR24h_version1.drop('ID', axis=1)
ACR24h_version3= ACR24h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
ACR24h_version3.to_excel('ACR24h-adjusted.xlsx',index=False)
ACR24h_version3

	GENE_SYMBOL	padj	t	B	logFC	SPOT_ID
0	HMOX1	0.000253	6.285084	6.360406	1.179039	A_23_P120883
1	HMOX1	0.000253	6.242149	6.263043	1.166378	A_23_P120883
2	HMOX1	0.000253	6.235767	6.248567	1.173634	A_23_P120883
3	HMOX1	0.000253	6.234992	6.246809	1.174612	A_23_P120883
4	HMOX1	0.000253	6.228517	6.232119	1.181977	A_23_P120883
5	HMOX1	0.000253	6.218832	6.210143	1.180659	A_23_P120883
6	HMOX1	0.000253	6.176783	6.114708	1.170264	A_23_P120883
7	HMOX1	0.000253	6.148413	6.050293	1.165382	A_23_P120883
8	HMOX1	0.000253	6.142273	6.036349	1.166841	A_23_P120883
9	HMOX1	0.000253	6.137396	6.025273	1.162878	A_23_P120883
10	SLC7A11	0.003561	5.433205	4.425625	0.752090	A_32_P42684
11	KLHL6	0.031843	-4.660551	2.698548	-0.748734	A_23_P212655
12	CENPV	0.042802	-4.523276	2.398681	-0.234981	A_23_P313828
13	SLC38A4	0.047625	4.464863	2.271960	0.293797	A_23_P25487
14	CYP1A1	0.047916	-4.450278	2.240408	-0.234158	A_23_P163402
15	SLC7A11	0.048953	4.431768	2.200414	0.758739	A_32_P165477

step 67: You now set the working directory to the clone of the network

p4c.set_current_network('clone-GSE69844')

{}

step 68: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('ACR24h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID',network='clone-GSE69844')

{'mappedTables': [235742, 235780]}

step 69: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')

step 70: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 71: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')

''

5.3 MA exposure time 1

step 72: You first import the expression table into a new dataframe.

MA_10h= pd.read_csv('adaptedMA10h.tsv',sep='\t')
MA_10h

	ID	COL	ROW	NAME	SPOT_ID	CONTROL_TYPE	REFSEQ	GB_ACC	GENE	GENE_SYMBOL	...	SEQUENCE	SPOT_ID.1	ORDER	logFC	AveExpr	t	P.Value	padj	B	ENSEMBLE_GENE_ID
0	20269	147	94	A_23_P435029	A_23_P435029	False	NaN	BC015544	NaN	H3C14	...	CATCACAGTTGACAGGTTAAAAGCATTCACTGCAGCGATCTATGAG...	NaN	20269	-0.424471	0.018465	-5.210051	0.000002	0.024894	3.914649	ENSG00000203811
1	5888	232	125	A_23_P428298	A_23_P428298	False	NM_173561	NM_173561	222643.0	UNC5CL	...	GGGGATATTTTCCCCATGGATCAAGATCCAGTTTAGGGTTGGGAAA...	NaN	5888	-0.602074	0.081130	-4.989385	0.000005	0.045422	3.420390	ENSG00000124602
2	9982	208	97	A_24_P159434	A_24_P159434	False	NM_007261	NM_007261	11314.0	CD300A	...	AGTTTCTCTGGACTCTTAGGTTTATTTTTAATATGAAATATAAAAA...	NaN	9982	0.463349	-0.037320	4.850629	0.000008	0.048363	3.111913	ENSG00000167851
3	19016	155	49	A_23_P87678	A_23_P87678	False	NM_004950	NM_004950	1833.0	EPYC	...	GGATTGATCTGACATCAAATTTAATATCTGAGATTGATGAAGATGC...	NaN	19016	0.281188	0.000073	4.846602	0.000009	0.048363	3.102991	ENSG00000083782

4 rows × 29 columns

**step 73:**Unfortunately, these four genes are not included in the AOP network and thus can’t be mapped.

5.4 MA exposure time 2

step 74: You first import the expression table into a new dataframe.

MA_24h= pd.read_csv('adaptedMA24h.tsv',sep='\t')
MA_24h

	ID	COL	ROW	NAME	SPOT_ID	CONTROL_TYPE	REFSEQ	GB_ACC	GENE	GENE_SYMBOL	...	SEQUENCE	SPOT_ID.1	ORDER	logFC	AveExpr	t	P.Value	padj	B	ENSEMBLE_GENE_ID
0	21386	141	69	A_23_P163402	A_23_P163402	False	NM_000499	NM_000499	1543.0	CYP1A1	...	GGTAAAACAGGGCCACATAGATGCTGATGGAGCCTTCCCAAGTTGT...	NaN	21386	-0.821543	0.184573	-15.613779	2.421857e-23	5.450995e-19	41.618011	ENSG00000140465
1	16486	170	9	A_23_P257803	A_23_P257803	False	NM_013391	NM_013391	29958.0	DMGDH	...	TGGTATTGACCGAACCAACCAGAAACCGGCTTCAGAAAAAAGGTGG...	NaN	16486	-0.876482	0.050920	-14.794873	3.529727e-22	5.296355e-18	39.338343	ENSG00000132837
2	20269	147	94	A_23_P435029	A_23_P435029	False	NaN	BC015544	NaN	H3C14	...	CATCACAGTTGACAGGTTAAAAGCATTCACTGCAGCGATCTATGAG...	NaN	20269	-1.086935	0.018465	-13.341288	5.056895e-20	5.690903e-16	35.039718	ENSG00000203811
3	16942	167	117	A_23_P1676	A_23_P1676	False	NM_001080546	NM_001080546	219854.0	TMEM218	...	TACCCGTACCTTAGGATTTCCAACTGTTTTGAAAGGGAAATAGTAA...	NaN	16942	0.455706	-0.067257	12.524192	9.277620e-19	8.352641e-15	32.479715	ENSG00000150433
4	25809	115	63	A_23_P147950	A_23_P147950	False	NM_152419	NM_152419	138050.0	HGSNAT	...	TCTTTGGAACTTCATTCCGAGGAGATAAGCTTTAACTTTCCAAAAG...	NaN	25809	0.471999	-0.082378	12.106840	4.236592e-18	3.178503e-14	31.132408	ENSG00000165102
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2541	15620	175	41	A_24_P331704	A_24_P331704	False	NM_182507	NM_182507	144501.0	KRT80	...	CCAAGGGAGCAAATCCTCAGTGGGGATACAAGACATATAAAGTATA...	NaN	15620	-0.301665	0.012832	-2.916673	4.895606e-03	4.981367e-02	-0.681984	ENSG00000167767
2542	26372	111	128	A_23_P351724	A_23_P351724	False	NM_022648	NM_022648	7145.0	TNS1	...	CTCTAAGCCAGAATGGAAAATTCACCAGGACTCCATTCTTAAGCCT...	NaN	26372	0.312052	-0.254361	2.916410	4.899232e-03	4.983931e-02	-0.682639	ENSG00000079308
2543	40411	29	99	A_23_P369701	A_23_P369701	False	NM_021214	NM_021214	58489.0	ABHD17C	...	ATTACTAGCCAACAGAGTTTTACTATTTTGATTGTCTGGTTGGTTT...	NaN	40411	0.200178	0.007205	2.915475	4.912166e-03	4.993703e-02	-0.684974	ENSG00000136379
2544	33634	69	53	A_23_P62115	A_23_P62115	False	NM_003254	NM_003254	7076.0	TIMP1	...	CATGGAGAGTGTCTGCGGATACTTCCACAGGTCCCACAACCGCAGC...	NaN	33634	-0.221966	0.073864	-2.915367	4.913662e-03	4.994095e-02	-0.685243	ENSG00000102265
2545	29769	91	134	A_23_P133279	A_23_P133279	False	NM_199133	NM_199133	134145.0	ATPSCKMT	...	CTTGAGAGCTGCCACTCATTTAATATTTCTCATTTATGAGAAGAGA...	NaN	29769	0.147226	-0.003311	2.915143	4.916752e-03	4.996108e-02	-0.685800	ENSG00000150756

2546 rows × 29 columns

step 75: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

MA24h= pd.read_csv('adaptedMA24h.tsv',sep='\t')
MA24h_version1= MA24h[MA24h['padj'] < 0.05]
MA24h_version2=MA24h_version1.drop('ID', axis=1)
MA24h_version3= MA24h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
MA24h_version3.to_excel('MA24h-adjusted.xlsx',index=False)
MA24h_version3

	GENE_SYMBOL	padj	t	B	logFC	SPOT_ID
0	CYP1A1	5.450995e-19	-15.613779	41.618011	-0.821543	A_23_P163402
1	DMGDH	5.296355e-18	-14.794873	39.338343	-0.876482	A_23_P257803
2	H3C14	5.690903e-16	-13.341288	35.039718	-1.086935	A_23_P435029
3	TMEM218	8.352641e-15	12.524192	32.479715	0.455706	A_23_P1676
4	HGSNAT	3.178503e-14	12.106840	31.132408	0.471999	A_23_P147950
...	...	...	...	...	...	...
2541	KRT80	4.981367e-02	-2.916673	-0.681984	-0.301665	A_24_P331704
2542	TNS1	4.983931e-02	2.916410	-0.682639	0.312052	A_23_P351724
2543	ABHD17C	4.993703e-02	2.915475	-0.684974	0.200178	A_23_P369701
2544	TIMP1	4.994095e-02	-2.915367	-0.685243	-0.221966	A_23_P62115
2545	ATPSCKMT	4.996108e-02	2.915143	-0.685800	0.147226	A_23_P133279

2546 rows × 6 columns

step 76: You now set the working directory to the clone of the network

p4c.set_current_network('clone-GSE69844')

{}

step 77: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('MA24h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID',network='clone-GSE69844')

{'mappedTables': [235742, 235780]}

step 78: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')

step 79: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 80: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')

''

5.5 CP exposure time 1

step 81: You first import the expression table into a new dataframe.

CP_10h= pd.read_csv('adaptedCP10h.tsv',sep='\t')
CP_10h

	ID	COL	ROW	NAME	SPOT_ID	CONTROL_TYPE	REFSEQ	GB_ACC	GENE	GENE_SYMBOL	...	SEQUENCE	SPOT_ID.1	ORDER	logFC	AveExpr	t	P.Value	padj	B	ENSEMBLE_GENE_ID
0	26605	110	2	A_23_P216225	A_23_P216225	False	NM_004430	NM_004430	1960	EGR3	...	GGTTGTGAATTTCCAGGTACTTGGACTTTTTGTAGAAGTAGAGAGA...	NaN	26605	0.863408	0.094340	7.955544	4.246899e-11	0.000002	10.081124	ENSG00000179388
1	41618	22	65	A_23_P212639	A_23_P212639	False	NM_004593	NM_004593	6434	TRA2A	...	GCATTTGTGTAGTTTGGTGCTTTGTTCCAAGTTAAGTGTTTTCAGA...	NaN	41618	0.232531	0.026262	6.309106	3.135061e-08	0.000470	6.437660	ENSG00000164548
2	23385	129	151	A_24_P416370	A_24_P416370	False	NM_024015	NM_024015	3214	HOXB4	...	CAGCAGAAGCCTCTCTCCTAGACTGAAAATGAATGTGAAACTAGGA...	NaN	23385	-0.241820	-0.033594	-5.680978	3.665825e-07	0.002750	5.006423	ENSG00000182742
3	12053	196	35	A_23_P79155	A_23_P79155	False	NM_001508	NM_001508	2863	GPR39	...	TGGAAGAACAATGCAGGAGGGGGTGGCATCTCCTTCAGCTTCAGCA...	NaN	12053	-0.201463	-0.027988	-5.199354	2.302939e-06	0.010886	3.913296	ENSG00000183840
4	2479	252	143	A_23_P106194	A_23_P106194	False	NM_005252	NM_005252	2353	FOS	...	AGAGGGTTCCTGTAGACCTAGGGAGGACCTTATCTGTGCGTGAAAC...	NaN	2479	0.652862	0.097279	5.152892	2.742032e-06	0.011221	3.808527	ENSG00000170345
5	5075	237	51	A_23_P143143	A_23_P143143	False	NM_002166	NM_002166	3398	ID2	...	AGGCTTCTGAATTCCCTTCTGAGTTAATGTCAAATGACAGCAAAGC...	NaN	5075	0.631542	0.093306	5.023820	4.439835e-06	0.014181	3.518391	ENSG00000115738
6	18707	156	158	A_23_P39704	A_23_P39704	False	NM_001031684	NM_001031684	6432	SRSF7	...	CTCTCTTCGTAGATCAAGATCAGCTTCACTCAGAAGATCTAGGTCT...	NaN	18707	0.196261	0.014785	4.813274	9.650441e-06	0.024134	3.048559	ENSG00000115875
7	1390	258	112	A_23_P131846	A_23_P131846	False	NM_005985	NM_005985	6615	SNAI1	...	AACAATGTCTGAAAAGGGACTGTGAGTAATGGCTGTCACTTGTCGG...	NaN	1390	0.515691	0.161391	4.589456	2.171080e-05	0.044439	2.554946	ENSG00000124216
8	16555	169	42	A_24_P401615	A_24_P401615	False	NM_001039361	NM_001039361	343071	PRAMEF10	...	TTACCTGAGCCAGATGAGCAATCTTCGTGAACTCTTTTTAGCCTTC...	NaN	16555	0.595546	0.072522	4.589357	2.171847e-05	0.044439	2.554731	ENSG00000187545

9 rows × 29 columns

step 82: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

CP10h= pd.read_csv('adaptedCP10h.tsv',sep='\t')
CP10h_version1= CP10h[CP10h['padj'] < 0.05]
CP10h_version2=CP10h_version1.drop('ID', axis=1)
CP10h_version3= CP10h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
CP10h_version3.to_excel('CP10h-adjusted.xlsx',index=False)
CP10h_version3

	GENE_SYMBOL	padj	t	B	logFC	SPOT_ID
0	EGR3	0.000002	7.955544	10.081124	0.863408	A_23_P216225
1	TRA2A	0.000470	6.309106	6.437660	0.232531	A_23_P212639
2	HOXB4	0.002750	-5.680978	5.006423	-0.241820	A_24_P416370
3	GPR39	0.010886	-5.199354	3.913296	-0.201463	A_23_P79155
4	FOS	0.011221	5.152892	3.808527	0.652862	A_23_P106194
5	ID2	0.014181	5.023820	3.518391	0.631542	A_23_P143143
6	SRSF7	0.024134	4.813274	3.048559	0.196261	A_23_P39704
7	SNAI1	0.044439	4.589456	2.554946	0.515691	A_23_P131846
8	PRAMEF10	0.044439	4.589357	2.554731	0.595546	A_24_P401615

step 83: You now set the working directory to the clone of the network

p4c.set_current_network('clone-GSE69844')

{}

step 84: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('CP10h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID',network='clone-GSE69844')

{'mappedTables': [235742, 235780]}

step 85: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')

step 86: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 87: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')

''

5.6 CP exposure time 2

step 88: You first import the expression table into a new dataframe.

CP_24h= pd.read_csv('adaptedCP24h.tsv',sep='\t')
CP_24h

	ID	COL	ROW	NAME	SPOT_ID	CONTROL_TYPE	REFSEQ	GB_ACC	GENE	GENE_SYMBOL	...	SEQUENCE	SPOT_ID.1	ORDER	logFC	AveExpr	t	P.Value	padj	B	ENSEMBLE_GENE_ID
0	2985	249	151	A_24_P270728	A_24_P270728	False	NM_001042483	NM_001042483	26471.0	NUPR1	...	TATTCCCGCTGACTGAGTCTCTGAGGGGCTACCAGGAAAGCGCCTC...	NaN	2985	0.569941	0.167603	10.048946	1.026382e-14	4.620257e-10	22.754357	ENSG00000176046
1	26605	110	2	A_23_P216225	A_23_P216225	False	NM_004430	NM_004430	1960.0	EGR3	...	GGTTGTGAATTTCCAGGTACTTGGACTTTTTGTAGAAGTAGAGAGA...	NaN	26605	1.036190	0.094340	9.547580	7.333753e-14	1.650645e-09	21.111511	ENSG00000179388
2	20886	144	49	A_23_P1691	A_23_P1691	False	NM_002421	NM_002421	4312.0	MMP1	...	ACATGTGCAGTCACTGGTGTCACCCTGGATAGGCAAGGGATAACTC...	NaN	20886	-1.110417	0.336235	-9.060676	5.051693e-13	7.580066e-09	19.484600	ENSG00000196611
3	38540	40	101	A_23_P1691	A_23_P1691	False	NM_002421	NM_002421	4312.0	MMP1	...	ACATGTGCAGTCACTGGTGTCACCCTGGATAGGCAAGGGATAACTC...	NaN	38540	-1.050224	0.329034	-8.267679	1.208709e-11	8.352296e-08	16.778923	ENSG00000196611
4	33376	70	60	A_24_P85300	A_24_P85300	False	NM_020733	NM_020733	57493.0	HEG1	...	AGGATGAGCGTACCACTGAAGTCTGAAGATGTCGCCATTGAACGGA...	NaN	33376	0.404108	-0.001845	8.218781	1.471445e-11	8.352296e-08	16.610201	ENSG00000173706
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
671	17270	165	141	A_32_P800179	A_32_P800179	False	NaN	AK094933	NaN	SLC30A6	...	CTGTGTGTTAAAAGCATTGTATACGTGAAAAAGGACTCAAACTCAT...	NaN	17270	-0.295208	-0.018674	-3.350876	1.364742e-03	4.938413e-02	0.583949	ENSG00000152683
672	20415	146	142	A_23_P162589	A_23_P162589	False	NM_001017535	NM_001017535	7421.0	VDR	...	CAAGCGAGGTCAACAGAGAAGGCAGGAATGTGTGGCAGATTTAGTG...	NaN	20415	-0.180143	0.077814	-3.350611	1.365847e-03	4.938443e-02	0.583251	ENSG00000111424
673	16828	168	5	A_24_P4816	A_24_P4816	False	NM_031412	NM_031412	23710.0	GABARAPL1	...	GGATTGGCTTTGATAGAGGAATGGGGATGATGTAAGTTTACAGTAT...	NaN	16828	0.328695	0.144492	3.350035	1.368258e-03	4.943190e-02	0.581731	ENSG00000139112
674	7523	222	86	A_23_P18413	A_23_P18413	False	NM_016589	NM_016589	51300.0	TIMMDC1	...	TGCTGACAAATTTAAGTGCTGGTACCTGTGGTGGCAGTGGCTTGCT...	NaN	7523	-0.104950	0.041448	-3.348029	1.376679e-03	4.965340e-02	0.576441	ENSG00000113845
675	43883	8	126	A_23_P366983	A_23_P366983	False	NM_013381	NM_013381	29953.0	TRHDE	...	AGTTACCACATATTCACGTTTATAAAATCCTTAATTAAATGAGTAA...	NaN	43883	0.170322	-0.008790	3.347479	1.378996e-03	4.966040e-02	0.574992	ENSG00000072657

676 rows × 29 columns

step 89: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

CP24h= pd.read_csv('adaptedCP10h.tsv',sep='\t')
CP24h_version1= CP24h[CP24h['padj'] < 0.05]
CP24h_version2=CP24h_version1.drop('ID', axis=1)
CP24h_version3= CP24h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
CP24h_version3.to_excel('CP24h-adjusted.xlsx',index=False)
CP24h_version3

	GENE_SYMBOL	padj	t	B	logFC	SPOT_ID
0	EGR3	0.000002	7.955544	10.081124	0.863408	A_23_P216225
1	TRA2A	0.000470	6.309106	6.437660	0.232531	A_23_P212639
2	HOXB4	0.002750	-5.680978	5.006423	-0.241820	A_24_P416370
3	GPR39	0.010886	-5.199354	3.913296	-0.201463	A_23_P79155
4	FOS	0.011221	5.152892	3.808527	0.652862	A_23_P106194
5	ID2	0.014181	5.023820	3.518391	0.631542	A_23_P143143
6	SRSF7	0.024134	4.813274	3.048559	0.196261	A_23_P39704
7	SNAI1	0.044439	4.589456	2.554946	0.515691	A_23_P131846
8	PRAMEF10	0.044439	4.589357	2.554731	0.595546	A_24_P401615

step 90: You now set the working directory to the clone of the network

p4c.set_current_network('clone-GSE69844')

{}

step 91: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('CP24h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID',network='clone-GSE69844')

{'mappedTables': [235742, 235780]}

step 92: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')

step 93: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

Low expression value (minimum) = blue node color
No expression value = white node color
High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 94: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')

''

Section 6: Metadata

step 95. At last, the metadata belonging to this jupyternotebook is displayed which contains the version numbers of packages and system-set-up for interested users. This requires the usage of packages:Watermark and print_versions.

%load_ext watermark
!pip install print-versions

Requirement already satisfied: print-versions in c:\users\shaki\anaconda3\lib\site-packages (0.1.0)

%watermark

Last updated: 2025-06-03T17:26:16.367030+02:00

Python implementation: CPython
Python version       : 3.12.3
IPython version      : 8.25.0

Compiler    : MSC v.1938 64 bit (AMD64)
OS          : Windows
Release     : 11
Machine     : AMD64
Processor   : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
CPU cores   : 8
Architecture: 64bit

from print_versions import print_versions
print_versions(globals())

json==2.0.9
ipykernel==6.28.0
numpy==1.26.4
pandas==2.2.2
ipywidgets==8.0.3
xarray==2023.6.0
py4cytoscape==1.9.0

Reference:

Basic Data Visualization — py4cytoscape 0.0.5 documentation \[Internet\]. Readthedocs.io. 2021 \[cited 2025 Feb 26\]. Available from: https://py4cytoscape.readthedocs.io/en/0.0.5/tutorials/basic-data-visualization.html ‌

Part 5: Visualization of transcriptomics expression datasets in the enriched AOP network part 2

The AOP project ► Key objective 2

Author: Shakira Agata

Section 1: System preparation

Section 2: Retrieval of molecular inflammation-process related AOP network

Section 3: Adaptation of gene node color within the enriched AOP network

Section 4: Mapping of dataset:GSE69844

4.1 Bisphenol A

4.1.1 Concentration 1

3.1.2 Concentration 2

3.1.3 Concentration 3

4.2 Farnesol

3.2.1 Concentration 1

3.2.2 Concentration 2

3.2.3 Concentration 3

4.4 Tetrachlorodibenzo p-dioxin

3.4.1 Concentration 1

3.4.2 Concentration 2

3.4.3 Concentration 3

4.5 Valproic acid

3.5.1 Concentration 1

3.5.2 Concentration 2

3.5.3 Concentration 3

4.7 Troglitazone

4.7.1 Concentration 1

3.7.2 Concentration 2

3.7.3 Concentration 3

Section 5: Mapping of dataset:GSE44729

5.1 ACR exposure time 1

5.2 ACR exposure time 2

5.3 MA exposure time 1

5.4 MA exposure time 2

5.5 CP exposure time 1

5.6 CP exposure time 2

Section 6: Metadata