Part 5: Visualization of transcriptomics expression datasets in the enriched AOP network part 2

The AOP project ► Key objective 2

Author: Shakira Agata

This Jupyter notebook describes the steps needed for the mapping of transcriptomics datasets in the constructed enriched AOP network. For this notebook, open license transcriptomics datasets were derived from ArrayExpress and Gene Expression Omnibus (GEO). These datasets were preprocessed followed by execution of statistical analysis to identify differential expression genes (DEG). The tabulation of differential gene expression data was subsequently mapped/integrated into the network. This notebook is subdivided into the following six sections:

  • Section 1: System preparation
  • Section 2: Retrieval of the enriched AOP network
  • Section 3: Adaptation of gene node color within the enriched AOP network
  • Section 4: Mapping of dataset: E-GEOD-69851
    • Section 4.1 Bisphenol A
    • Section 4.2 Farnesol
    • Section 4.3 Tetrachlorodibenzo p-dioxin
    • Section 4.4 Troglitazone
    • Section 4.5 Valproic acid
  • Section 5: Mapping of dataset: E-GEOD-69851
    • Section 5.1 ACR exposure time 1
    • Section 5.2 ACR exposure time 2
    • Section 5.3 MA exposure time 1
    • Section 5.4 MA exposure time 2
    • Section 5.5 CP exposure time 1
    • Section 5.6 CP exposure time 2
  • Section 6: Metadata

Section 1: System preparation

In this section, you will import the required packages and tools you need for this Jupyternotebook.

step 1: You imported Pandas, Py4cytoscape and style mapping functions of Py4cytoscape.

import pandas as pd
import glob
import py4cytoscape as p4c
p4c.cytoscape_ping()
p4c.cytoscape_version_info()
You are connected to Cytoscape!





{'apiVersion': 'v1',
 'cytoscapeVersion': '3.10.1',
 'automationAPIVersion': '1.9.0',
 'py4cytoscapeVersion': '1.9.0'}
from py4cytoscape import get_node_color
from py4cytoscape import set_node_color_mapping
from py4cytoscape import gen_node_color_map
from py4cytoscape import set_edge_color_default
from py4cytoscape import set_node_color_default
from py4cytoscape import set_edge_source_arrow_shape_default
from py4cytoscape import set_edge_target_arrow_shape_default
from py4cytoscape import get_arrow_shapes
from py4cytoscape import get_edge_target_arrow_shape
from py4cytoscape import set_edge_target_arrow_shape_mapping
from py4cytoscape import gen_edge_arrow_map
from py4cytoscape import select_nodes
from py4cytoscape import get_table_value
from py4cytoscape import get_network_suid
from py4cytoscape import clear_selection
from py4cytoscape import set_node_color_bypass
from py4cytoscape import set_edge_color_bypass
from py4cytoscape import set_edge_target_arrow_color_default
from py4cytoscape import set_node_size_bypass
from py4cytoscape import create_subnetwork

In this section, you will change the node color of genes and adapt the style for easier intepretation of the upcoming results. This is needed in preparation for the mapping of transcriptomics datasets. These datasets may contain genes that are not present in the build AOP network and so therefore should receive a distinct color to correctly inform user.

step 2: You open the session you saved in the previous Jupyternotebook.

p4c.open_session('Agata,S.-Part4-Complete Molecular inflammation-process related AOP network.cys')
Opening C:\Users\shaki\Downloads\Agata,S.-Part4-Complete Molecular inflammation-process related AOP network.cys...





{}

Section 3: Adaptation of gene node color within the enriched AOP network

In this section, you will change the node color of genes and adapt the style for easier intepretation of the upcoming results. This is needed in preparation for the mapping of transcriptomics datasets. These datasets may contain genes that are not present in the build AOP network and so therefore should receive a distinct color to correctly inform user.

step 3: You can change the style with the following commands.

style_name = "default"
defaults = {'NODE_SHAPE': "ELLIPSE", 'NODE_SIZE': 20, 'EDGE_TRANSPARENCY': 140, 'NODE_LABEL_POSITION': "C,C,c,0.00,0.00"}
nodeLabels = p4c.map_visual_property('node label', 'name', 'p') 
edgeWidth = p4c.map_visual_property('edge width', 'weight', 'p') 
arrowShapes = p4c.map_visual_property('Edge Target Arrow Shape','interaction', 'd')
p4c.create_visual_style(style_name, defaults, [nodeLabels, edgeWidth])
p4c.set_visual_style(style_name)
{'message': 'Visual Style applied.'}
set_node_color_default('#a7a5a5',style_name='default')
set_edge_color_default('#01e735', style_name='default')
''
p4c.clone_network()
p4c.rename_network('clone-GSE69844')
{'network': 235771, 'title': 'clone-GSE69844'}

Section 4: Mapping of dataset:GSE69844

In this section, you will map the transcriptomics expression data of dataset:GSE69844.This will be done in similar fashion as in previous section, but will be streamlined due to high number of datafiles. In preparation for this section, you must first download the datafiles of the chemicals into separate folders.

4.1 Bisphenol A

4.1.1 Concentration 1

step 4: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

BisphenolA_concentration1= pd.read_csv('GSE69844.BisphenolA-1uM.tsv',sep='\t')
Adjusted_BisphenolA_concentration1= BisphenolA_concentration1[BisphenolA_concentration1['padj'] < 0.05]
BisphenolA_concentration_1=Adjusted_BisphenolA_concentration1.drop('ID', axis=1)
BisphenolA_Concentration1 = BisphenolA_concentration_1[['Entrez.Gene','Gene.Symbol','padj', 't', 'B','logFC','GB_LIST','SPOT_ID']]
BisphenolA_Concentration1.to_excel('GSE69844.BisphenolA-concentration1.xlsx',index=False)
BisphenolA_Concentration1

Entrez.Gene Gene.Symbol padj t B logFC GB_LIST SPOT_ID
0 28996 HIPK2 0.000026 -12.821942 12.209372 -0.593550 NM_001113239,NM_022740,XM_001716827,XM_925800 NaN
1 87 ACTN1 0.000026 -12.667672 12.062264 -0.585774 NM_001102,NM_001130004,NM_001130005 NaN
2 1455 CSNK1G2 0.000029 -12.228128 11.631158 -0.489035 NM_001319 NaN
3 2316 FLNA 0.000038 -11.657262 11.043619 -0.444861 NM_001110556,NM_001456 NaN
4 23524 SRRM2 0.000038 -11.409780 10.778745 -0.554962 NM_016333 NaN
... ... ... ... ... ... ... ... ...
3472 7485 WRB 0.049943 3.424377 -1.958139 0.136566 NM_001146218,NM_004627 NaN
3473 81607 PVRL4 0.049943 -3.424351 -1.958192 -0.152348 NM_030916 NaN
3474 91057 CCDC34 0.049943 3.424349 -1.958195 0.170781 NM_030771,NM_080654 NaN
3475 79780 CCDC82 0.049976 3.423893 -1.959112 0.138349 NM_024725 NaN
3476 2108 ETFA 0.049982 3.423700 -1.959501 0.142652 NM_000126,NM_001127716 NaN

3477 rows × 8 columns

**step 5: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.BisphenolA-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID',network='Agata,S.-Part4-Molecular inflammation-process related AOP network')
{'mappedTables': [388893, 388931]}

step 6: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

**step 7: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 8: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.1.2 Concentration 2

step 9: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

BisphenolA_concentration2= pd.read_csv('GSE69844.BisphenolAConcentration2.tsv',sep='\t')
Adjusted_BisphenolA_concentration2= BisphenolA_concentration2[BisphenolA_concentration2['adj.P.Val'] < 0.05]
BisphenolA_concentration_2=Adjusted_BisphenolA_concentration2.drop('ID', axis=1)
BisphenolA_Concentration2 = BisphenolA_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
BisphenolA_Concentration2.to_excel('GSE69844.BisphenolA-concentration2.xlsx',index=False)
BisphenolA_Concentration2

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 COL8A1 0.0172 -9.304198 3.89521 -0.381 NM_001850,NM_020351 NaN
1 S100A2 0.0414 8.109804 3.14555 0.330 NM_005978 NaN

step 10: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.BisphenolA-concentration2.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID')
{'mappedTables': [388893, 388931]}

step 11: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 12: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 13: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.1.3 Concentration 3

step 14: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

BisphenolA_concentration3= pd.read_csv('GSE69844.BisphenolAConcentration3.tsv',sep='\t')
Adjusted_BisphenolA_concentration3= BisphenolA_concentration3[BisphenolA_concentration3['adj.P.Val'] < 0.05]
BisphenolA_concentration_3=Adjusted_BisphenolA_concentration3.drop('ID', axis=1)
BisphenolA_Concentration3 = BisphenolA_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
BisphenolA_Concentration3.to_excel('GSE69844.BisphenolA-concentration3.xlsx',index=False)
BisphenolA_Concentration3

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 COL8A1 0.000002 -15.333380 15.028626 -0.649 NM_001850,NM_020351 NaN
1 ZBTB16 0.000003 -14.358816 14.190854 -0.754 NM_001018011,NM_006006 NaN
2 PDE4DIP 0.000003 -14.169566 14.020061 -0.640 NM_001002810,NM_001002811,NM_001002812,NM_0146... NaN
3 GGT5 0.000003 -13.645523 13.532532 -0.596 NM_001099781,NM_001099782,NM_004121 NaN
4 ZFP36L2 0.000003 -13.583284 13.473166 -0.520 NM_006887 NaN
... ... ... ... ... ... ... ...
5726 PSAT1 0.049904 3.176887 -2.554956 0.136 NM_021154,NM_058179 NaN
5727 PXN 0.049921 3.176598 -2.555540 0.170 NM_001080855,NM_002859,NM_025157 NaN
5728 SET 0.049921 -3.176562 -2.555612 -0.125 NM_001122821,NM_003011 NaN
5729 SIGMAR1 0.049943 -3.176276 -2.556190 -0.153 NM_005866,NM_147157 NaN
5730 SEMA3B 0.049965 -3.175985 -2.556778 -0.133 NM_001005914,NM_004636 NaN

5731 rows × 7 columns

step 15: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.BisphenolA-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

step 16: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 17: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 18: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''

4.2 Farnesol

3.2.1 Concentration 1

step 19: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 1 of Farnesol does not yield significant genes and thus can’t be mapped in the AOP network.

Farnesol_concentration1= pd.read_csv('GSE69844.FarnesolConcentration1.tsv',sep='\t')
Adjusted_Farnesol_concentration1= Farnesol_concentration1[Farnesol_concentration1['adj.P.Val'] < 0.05]
Farnesol_concentration_1=Adjusted_Farnesol_concentration1.drop('ID', axis=1)
Farnesol_Concentration1 = Farnesol_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Farnesol_Concentration1.to_excel('GSE69844.Farnesol-concentration1.xlsx',index=False)
Farnesol_Concentration1

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
3.2.2 Concentration 2

step 20: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 2 of Farnesol does not yield significant genes and thus can’t be mapped in the AOP network.

Farnesol_concentration2= pd.read_csv('GSE69844.FarnesolConcentration2.tsv',sep='\t')
Adjusted_Farnesol_concentration2= Farnesol_concentration2[Farnesol_concentration2['adj.P.Val'] < 0.05]
Farnesol_concentration_2=Adjusted_Farnesol_concentration2.drop('ID', axis=1)
Farnesol_Concentration2 = Farnesol_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Farnesol_Concentration2.to_excel('GSE69844.Farnesol-concentration2.xlsx',index=False)
Farnesol_Concentration2

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
3.2.3 Concentration 3

step 21: You first import the expression table into a new dataframe.

Farnesol_concentration3= pd.read_csv('GSE69844.FarnesolConcentration3.tsv',sep='\t')
Adjusted_Farnesol_concentration3= Farnesol_concentration3[Farnesol_concentration3['adj.P.Val'] < 0.05]
Farnesol_concentration_3=Adjusted_Farnesol_concentration3.drop('ID', axis=1)
Farnesol_Concentration3 = Farnesol_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Farnesol_Concentration3.to_excel('GSE69844.Farnesol-concentration3.xlsx',index=False)
Farnesol_Concentration3

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 IGFBP1 0.00101 13.437237 6.97653 0.709 NM_000596 NaN
1 CBX5 0.01410 -9.847783 5.16687 -0.627 NM_001127321,NM_001127322,NM_012117 NaN
2 --- 0.01543 -9.388084 4.85421 -0.569 NaN --AFFX-HUMRGE/M10098_3
3 ANGPTL4 0.03269 8.326084 4.03872 0.701 NM_001039667,NM_139314 NaN
4 FNDC3B 0.03269 -8.294095 4.01196 -0.439 NM_001135095,NM_022763 NaN

step 22: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Farnesol-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

step 23: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
Log2Foldchange_column

logFC
389124 NaN
401415 NaN
393220 NaN
401410 0.153
389120 NaN
... ...
401405 0.158
389112 NaN
401400 NaN
393210 0.370
397305 NaN

2952 rows × 1 columns

step 24: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 25: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''

4.4 Tetrachlorodibenzo p-dioxin

3.4.1 Concentration 1

step 26: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Tpdioxin_concentration1= pd.read_csv('GSE69844.TpdioxinConcentration1.tsv',sep='\t')
Adjusted_Tpdioxin_concentration1= Tpdioxin_concentration1[Tpdioxin_concentration1['adj.P.Val'] < 0.05]
Tpdioxin_concentration_1=Adjusted_Tpdioxin_concentration1.drop('ID', axis=1)
Tpdioxin_Concentration1 = Tpdioxin_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Tpdioxin_Concentration1.to_excel('GSE69844.Tpdioxin-concentration1.xlsx',index=False)
Tpdioxin_Concentration1

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 CYP1B1 0.000628 54.309631 3.33321 3.630 NM_000104 NaN
1 CYP1B1 0.002197 37.892762 3.22492 3.550 NM_000104 NaN
2 CYP1B1 0.002271 34.935858 3.18861 3.270 NM_000104 NaN
3 IER3 0.003985 29.839182 3.10077 1.430 NM_003897 NaN
4 CYP1A1 0.004145 27.446373 3.04317 3.280 NM_000499 NaN
5 CYP1B1 0.004145 27.033615 3.03179 3.710 NM_000104 NaN
6 CYP1A1 0.004145 26.696606 3.02214 3.190 NM_000499 NaN
7 HSD17B2 0.017668 19.876908 2.72569 0.923 NM_002153 NaN
8 SLC7A11 0.041721 16.244113 2.42797 0.782 NM_014331 NaN
9 GDF15 /// LOC100292463 0.041721 16.141041 2.41710 0.890 NM_004864,XM_002345162 NaN
10 TIPARP 0.041721 15.940321 2.39544 0.708 NM_001184717,NM_001184718,NM_015508 NaN

step 27: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Tpdioxin-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

step 28: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 29: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 30: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.4.2 Concentration 2

step 31: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Tpdioxin_concentration2= pd.read_csv('GSE69844.TpdioxinConcentration2.tsv',sep='\t')
Adjusted_Tpdioxin_concentration2= Tpdioxin_concentration2[Tpdioxin_concentration2['adj.P.Val'] < 0.05]
Tpdioxin_concentration_2=Adjusted_Tpdioxin_concentration2.drop('ID', axis=1)
Tpdioxin_Concentration2 = Tpdioxin_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Tpdioxin_Concentration2.to_excel('GSE69844.Tpdioxin-concentration2.xlsx',index=False)
Tpdioxin_Concentration2

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 CYP1B1 0.00381 34.144660 -1.85 3.223325 NM_000104 NaN
1 CYP1B1 0.00381 32.206372 -1.86 3.642847 NM_000104 NaN
2 CYP1B1 0.00978 25.244475 -1.87 3.359260 NM_000104 NaN

step 32: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Tpdioxin-concentration2.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

step 33: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 34: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 35: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.4.3 Concentration 3

step 36: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Tpdioxin_concentration3= pd.read_csv('GSE69844.TpdioxinConcentration3.tsv',sep='\t')
Adjusted_Tpdioxin_concentration3= Tpdioxin_concentration3[Tpdioxin_concentration3['adj.P.Val'] < 0.05]
Tpdioxin_concentration_3=Adjusted_Tpdioxin_concentration3.drop('ID', axis=1)
Tpdioxin_Concentration3 = Tpdioxin_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Tpdioxin_Concentration3.to_excel('GSE69844.Tpdioxin-concentration3.xlsx',index=False)
Tpdioxin_Concentration3

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 CYP1B1 0.000035 66.355926 7.262273 3.951611 NM_000104 NaN
1 CYP1B1 0.000050 55.772416 7.162770 3.734094 NM_000104 NaN
2 CYP1A1 0.000060 50.590763 7.091598 3.380474 NM_000499 NaN
3 CYP1B1 0.000069 47.138520 7.031778 3.502514 NM_000104 NaN
4 SERPINB2 0.000145 40.135573 6.865297 3.556188 NM_001143818,NM_002575 NaN
... ... ... ... ... ... ... ...
145 MPHOSPH6 0.049924 8.406946 1.666651 0.409146 NM_005792 NaN
146 RUNX1 0.049924 8.402542 1.663899 0.420556 NM_001001890,NM_001122607,NM_001754 NaN
147 A1CF 0.049924 -8.400467 1.662601 -0.490591 NM_014576,NM_138932,NM_138933 NaN
148 ABLIM1 0.049924 -8.394971 1.659163 -0.487655 NM_001003407,NM_001003408,NM_002313,NM_006720 NaN
149 GNA13 0.049924 8.386056 1.653580 0.544947 NM_006572 NaN

150 rows × 7 columns

step 37: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Tpdioxin-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

step 38: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 39: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 40: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''

4.5 Valproic acid

3.5.1 Concentration 1

step 41: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 1 of Valproic acid does not yield significant genes and thus can’t be mapped in the AOP network.

Valproicacid_concentration1= pd.read_csv('GSE69844.ValproicacidConcentration1.tsv',sep='\t')
Adjusted_Valproicacid_concentration1= Valproicacid_concentration1[Valproicacid_concentration1['adj.P.Val'] < 0.05]
Valproicacid_concentration_1=Adjusted_Valproicacid_concentration1.drop('ID', axis=1)
Valproicacid_Concentration1 = Valproicacid_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Valproicacid_Concentration1.to_excel('GSE69844.Valproicacid-concentration1.xlsx',index=False)
Valproicacid_Concentration1

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 CXorf26 0.00223 -28.40 8.250030 -1.090 NM_016500 NaN
1 ONECUT2 0.00223 26.20 7.967265 1.310 NM_004852 NaN
2 KEAP1 0.00275 -22.30 7.312031 -1.190 NM_012289,NM_203500 NaN
3 ITPRIPL2 0.00275 -22.30 7.302028 -0.990 NM_001034841,NR_028028 NaN
4 TMEM170B 0.00275 22.00 7.241510 1.470 NM_001100829 NaN
... ... ... ... ... ... ... ...
4499 MTF2 0.04987 4.26 -2.075831 0.294 NM_001164391,NM_001164392,NM_001164393,NM_007358 NaN
4500 CDK5RAP1 0.04994 4.26 -2.077623 0.230 NM_016082,NM_016408 NaN
4501 DPF2 0.04994 -4.26 -2.077706 -0.233 NM_006268 NaN
4502 ROR2 0.04998 4.26 -2.078890 0.217 NM_004560 NaN
4503 SLC9A8 0.04998 -4.26 -2.079096 -0.222 NM_015266 NaN

4504 rows × 7 columns

step 42: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Valproicacid-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

step 43: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 44: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 45: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.5.2 Concentration 2

step 46: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 2 of Valproic acid does not yield significant genes and thus can’t be mapped in the AOP network.

Valproicacid_concentration2= pd.read_csv('GSE69844.ValproicacidConcentration2.tsv',sep='\t')
Adjusted_Valproicacid_concentration2= Valproicacid_concentration2[Valproicacid_concentration2['adj.P.Val'] < 0.05]
Valproicacid_concentration_2=Adjusted_Valproicacid_concentration2.drop('ID', axis=1)
Valproicacid_Concentration2 = Valproicacid_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Valproicacid_Concentration2.to_excel('GSE69844.Valproicacid-concentration2.xlsx',index=False)
Valproicacid_Concentration2

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
3.5.3 Concentration 3

step 47: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 3 of Valproic acid does not yield significant genes and thus can’t be mapped in the AOP network.

Valproicacid_concentration3= pd.read_csv('GSE69844.ValproicacidConcentration3.tsv',sep='\t')
Adjusted_Valproicacid_concentration3= Valproicacid_concentration3[Valproicacid_concentration3['adj.P.Val'] < 0.05]
Valproicacid_concentration_3=Adjusted_Valproicacid_concentration3.drop('ID', axis=1)
Valproicacid_Concentration3 = Valproicacid_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Valproicacid_Concentration3.to_excel('GSE69844.Valproicacid-concentration3.xlsx',index=False)
Valproicacid_Concentration3

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID

4.7 Troglitazone

4.7.1 Concentration 1

step 48: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Troglitazone_concentration1= pd.read_csv('GSE69844.TroglitazoneConcentration1.tsv',sep='\t')
Adjusted_Troglitazone_concentration1= Troglitazone_concentration1[Troglitazone_concentration1['adj.P.Val'] < 0.05]
Troglitazone_concentration_1=Adjusted_Troglitazone_concentration1.drop('ID', axis=1)
Troglitazone_Concentration1 = Troglitazone_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Troglitazone_Concentration1.to_excel('GSE69844.Troglitazone-concentration1.xlsx',index=False)
Troglitazone_Concentration1

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 FABP4 2.170000e-07 26.70 14.69249 2.278158 NM_001442 NaN
1 CSNK1G2 1.510000e-04 -14.40 10.27064 -0.585872 NM_001319 NaN
2 ACTN1 2.410000e-04 -13.30 9.59510 -0.580898 NM_001102,NM_001130004,NM_001130005 NaN
3 COL8A1 3.770000e-04 -12.50 9.00330 -0.629488 NM_001850,NM_020351 NaN
4 SRRM2 3.990000e-04 -12.20 8.77591 -0.730824 NM_016333 NaN
... ... ... ... ... ... ... ...
3118 PHF20 4.980000e-02 -3.68 -1.74472 -0.165499 NM_016436 NaN
3119 CTGF 4.980000e-02 -3.68 -1.74515 -0.219036 NM_001901 NaN
3120 ADNP2 4.980000e-02 -3.68 -1.74565 -0.189624 NM_014913 NaN
3121 ATP2A2 4.990000e-02 -3.68 -1.74633 -0.165539 NM_001135765,NM_001681,NM_170665 NaN
3122 C1orf198 4.990000e-02 -3.68 -1.74677 -0.194524 NM_001136494,NM_001136495,NM_032800 NaN

3123 rows × 7 columns

step 49: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Troglitazone-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

step 50: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 51: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 52: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.7.2 Concentration 2

step 53: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Troglitazone_concentration2= pd.read_csv('GSE69844.TroglitazoneConcentration2.tsv',sep='\t')
Adjusted_Troglitazone_concentration2= Troglitazone_concentration2[Troglitazone_concentration2['adj.P.Val'] < 0.05]
Troglitazone_concentration_2=Adjusted_Troglitazone_concentration2.drop('ID', axis=1)
Troglitazone_Concentration2 = Troglitazone_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Troglitazone_Concentration2.to_excel('GSE69844.Troglitazone-concentration2.xlsx',index=False)
Troglitazone_Concentration2

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 FABP4 1.000000e-12 57.329825 13.74886 3.351827 NM_001442 NaN
1 PLIN4 2.340000e-05 14.969008 9.66342 0.709311 NM_001080400 NaN
2 ATP2B4 5.990000e-05 13.451635 8.95408 0.550716 NM_001001396,NM_001684 NaN
3 PDK4 5.990000e-05 13.146994 8.79403 1.124497 NM_002612 NaN
4 DLC1 9.970000e-05 12.396132 8.37088 0.527070 NM_001164271,NM_006094,NM_024767,NM_182643 NaN
... ... ... ... ... ... ... ...
69 TXNIP 4.420000e-02 5.718843 1.90694 0.311458 NM_006472 NaN
70 PHLDA3 4.420000e-02 5.712639 1.89787 0.227240 NM_012396 NaN
71 ATP1B1 4.600000e-02 -5.680871 1.85131 -0.231002 NM_001001787,NM_001677 NaN
72 ANKRD1 4.690000e-02 -5.661073 1.82221 -0.295793 NM_014391 NaN
73 ZBED3 4.920000e-02 5.625298 1.76945 0.301710 NM_032367 NaN

74 rows × 7 columns

step 54: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.geneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Troglitazone-concentration2.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

step 55: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 56: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 57: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.7.3 Concentration 3

step 58: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.

Troglitazone_concentration3= pd.read_csv('GSE69844.TroglitazoneConcentration3.tsv',sep='\t')
Adjusted_Troglitazone_concentration3= Troglitazone_concentration3[Troglitazone_concentration3['adj.P.Val'] < 0.05]
Troglitazone_concentration_3=Adjusted_Troglitazone_concentration3.drop('ID', axis=1)
Troglitazone_Concentration3 = Troglitazone_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Troglitazone_Concentration3.to_excel('GSE69844.Troglitazone-concentration3.xlsx',index=False)
Troglitazone_Concentration3

Gene.Symbol adj.P.Val t B logFC GB_LIST SPOT_ID
0 FABP4 2.190000e-17 63.189122 33.180151 3.760460 NM_001442 NaN
1 PDK4 8.910000e-12 27.858041 25.280555 1.669789 NM_002612 NaN
2 KLF9 8.910000e-12 -27.795929 25.251639 -1.271744 NM_001206 NaN
3 INSIG1 8.910000e-12 27.633020 25.175329 1.021568 NM_005542,NM_198336,NM_198337 NaN
4 CYP1B1 4.890000e-11 24.684925 23.668933 1.327687 NM_000104 NaN
... ... ... ... ... ... ... ...
9171 RPA2 4.990000e-02 2.923957 -3.330818 0.110598 NM_002946 NaN
9172 INHBE 4.990000e-02 2.923939 -3.330855 0.214330 NM_031479 NaN
9173 C5orf41 4.990000e-02 -2.923657 -3.331433 -0.113331 NM_001168393,NM_001168394,NM_153607 NaN
9174 C7orf68 4.990000e-02 2.923642 -3.331465 0.281327 NM_001098786,NM_013332 NaN
9175 RPL24 4.990000e-02 2.923395 -3.331971 0.112186 NM_000986 NaN

9176 rows × 7 columns

step 59: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.geneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('GSE69844.Troglitazone-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

step 60: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')

step 61: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 62: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''

Section 5: Mapping of dataset:GSE44729

In this section, you will map the transcriptomics expression data of datasets: GSE44729. This dataset aimed to transcriptonally profile BEAS-2B cells for the comparison between controls and skin sensitizers, controls and respiratory sensitizers and controls and non-sensitizing irritants.

5.1 ACR exposure time 1

step 63: You first import the expression table into a new dataframe.

ACR_10h= pd.read_csv('adaptedACR10h.tsv',sep='\t')
ACR_10h

ID COL ROW NAME SPOT_ID CONTROL_TYPE REFSEQ GB_ACC GENE GENE_SYMBOL ... SEQUENCE SPOT_ID.1 ORDER logFC AveExpr t P.Value padj B ENSEMBLE_GENE_ID
0 30939 85 3 A_24_P102821 A_24_P102821 False NM_000952 NM_000952 5724 PTAFR ... ATACGGTCACTGAAGTGGTTGTGCCATTCAACCAGATCCCTGGCAA... NaN 30939 0.472665 0.049355 5.812795 2.199154e-07 0.005924 5.278602 ENSG00000169403

1 rows × 29 columns

step 64: Unfortunately, the PTAFR gene is not included in the AOP network and thus can’t be mapped.

p4c.load_table_data_from_file('adaptedACR10h.tsv', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}

5.2 ACR exposure time 2

step 65: You first import the expression table into a new dataframe.

ACR_24h= pd.read_csv('adaptedACR24h.tsv',sep='\t')
ACR_24h

ID COL ROW NAME SPOT_ID CONTROL_TYPE REFSEQ GB_ACC GENE GENE_SYMBOL ... SEQUENCE SPOT_ID.1 ORDER logFC AveExpr t P.Value padj B ENSEMBLE_GENE_ID
0 42685 15 142 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 42685 1.179039 0.954675 6.285084 3.447616e-08 0.000253 6.360406 ENSG00000100292
1 40823 26 126 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 40823 1.166378 0.956092 6.242149 4.085234e-08 0.000253 6.263043 ENSG00000100292
2 9749 209 54 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 9749 1.173634 0.954391 6.235767 4.189503e-08 0.000253 6.248567 ENSG00000100292
3 29347 94 127 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 29347 1.174612 0.964019 6.234992 4.202344e-08 0.000253 6.246809 ENSG00000100292
4 43478 11 85 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 43478 1.181977 0.956175 6.228517 4.311172e-08 0.000253 6.232119 ENSG00000100292
5 36362 53 37 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 36362 1.180659 0.960191 6.218832 4.479195e-08 0.000253 6.210143 ENSG00000100292
6 4189 242 123 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 4189 1.170264 0.957791 6.176783 5.287237e-08 0.000253 6.114708 ENSG00000100292
7 20842 144 137 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 20842 1.165382 0.963625 6.148413 5.912487e-08 0.000253 6.050293 ENSG00000100292
8 6877 226 18 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 6877 1.166841 0.951196 6.142273 6.057178e-08 0.000253 6.036349 ENSG00000100292
9 16256 171 129 A_23_P120883 A_23_P120883 False NM_002133 NM_002133 3162 HMOX1 ... TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... NaN 16256 1.162878 0.964791 6.137396 6.174589e-08 0.000253 6.025273 ENSG00000100292
10 26200 112 132 A_32_P42684 A_32_P42684 False NM_014331 NM_014331 23657 SLC7A11 ... TTACTGATACTAAATGTTGGCTACCTGTGATTTTATAGTATGCACA... NaN 26200 0.752090 0.640196 5.433205 9.494058e-07 0.003561 4.425625 ENSG00000151012
11 21063 143 35 A_23_P212655 A_23_P212655 False NM_130446 NM_130446 89857 KLHL6 ... TTCTGGTCTCAATGGCTTCGGGAAACACACATATACACATACACCA... NaN 21063 -0.748734 -0.040346 -4.660551 1.681000e-05 0.031843 2.698548 ENSG00000172578
12 28950 96 72 A_23_P313828 A_23_P313828 False NM_181716 NM_181716 201161 CENPV ... TTTGACTGCAATTGCAGCATTTGCAAGAAGAAGCAGAATAGACACT... NaN 28950 -0.234981 0.018625 -4.523276 2.750745e-05 0.042802 2.398681 ENSG00000166582
13 18739 156 94 A_23_P25487 A_23_P25487 False NM_018018 NM_018018 55089 SLC38A4 ... TGTTCTGGTCATCCTTGTGCCAACTATAAAATACATCTTCGGATTC... NaN 18739 0.293797 -0.025377 4.464863 3.385527e-05 0.047625 2.271960 ENSG00000139209
14 21386 141 69 A_23_P163402 A_23_P163402 False NM_000499 NM_000499 1543 CYP1A1 ... GGTAAAACAGGGCCACATAGATGCTGATGGAGCCTTCCCAAGTTGT... NaN 21386 -0.234158 0.184573 -4.450278 3.565014e-05 0.047916 2.240408 ENSG00000140465
15 32586 75 109 A_32_P165477 A_32_P165477 False NM_014331 NM_014331 23657 SLC7A11 ... CATTTTGCTTTCCTAACCATTCAGTCAGGAATTAAAATATGGCATT... NaN 32586 0.758739 0.723792 4.431768 3.806166e-05 0.048953 2.200414 ENSG00000151012

16 rows × 29 columns

step 66: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

ACR24h= pd.read_csv('adaptedACR24h.tsv',sep='\t')
ACR24h_version1= ACR24h[ACR24h['padj'] < 0.05]
ACR24h_version2=ACR24h_version1.drop('ID', axis=1)
ACR24h_version3= ACR24h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
ACR24h_version3.to_excel('ACR24h-adjusted.xlsx',index=False)
ACR24h_version3

GENE_SYMBOL padj t B logFC SPOT_ID
0 HMOX1 0.000253 6.285084 6.360406 1.179039 A_23_P120883
1 HMOX1 0.000253 6.242149 6.263043 1.166378 A_23_P120883
2 HMOX1 0.000253 6.235767 6.248567 1.173634 A_23_P120883
3 HMOX1 0.000253 6.234992 6.246809 1.174612 A_23_P120883
4 HMOX1 0.000253 6.228517 6.232119 1.181977 A_23_P120883
5 HMOX1 0.000253 6.218832 6.210143 1.180659 A_23_P120883
6 HMOX1 0.000253 6.176783 6.114708 1.170264 A_23_P120883
7 HMOX1 0.000253 6.148413 6.050293 1.165382 A_23_P120883
8 HMOX1 0.000253 6.142273 6.036349 1.166841 A_23_P120883
9 HMOX1 0.000253 6.137396 6.025273 1.162878 A_23_P120883
10 SLC7A11 0.003561 5.433205 4.425625 0.752090 A_32_P42684
11 KLHL6 0.031843 -4.660551 2.698548 -0.748734 A_23_P212655
12 CENPV 0.042802 -4.523276 2.398681 -0.234981 A_23_P313828
13 SLC38A4 0.047625 4.464863 2.271960 0.293797 A_23_P25487
14 CYP1A1 0.047916 -4.450278 2.240408 -0.234158 A_23_P163402
15 SLC7A11 0.048953 4.431768 2.200414 0.758739 A_32_P165477

step 67: You now set the working directory to the clone of the network

p4c.set_current_network('clone-GSE69844')
{}

step 68: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('ACR24h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID',network='clone-GSE69844')
{'mappedTables': [235742, 235780]}

step 69: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')

step 70: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 71: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')
''

5.3 MA exposure time 1

step 72: You first import the expression table into a new dataframe.

MA_10h= pd.read_csv('adaptedMA10h.tsv',sep='\t')
MA_10h

ID COL ROW NAME SPOT_ID CONTROL_TYPE REFSEQ GB_ACC GENE GENE_SYMBOL ... SEQUENCE SPOT_ID.1 ORDER logFC AveExpr t P.Value padj B ENSEMBLE_GENE_ID
0 20269 147 94 A_23_P435029 A_23_P435029 False NaN BC015544 NaN H3C14 ... CATCACAGTTGACAGGTTAAAAGCATTCACTGCAGCGATCTATGAG... NaN 20269 -0.424471 0.018465 -5.210051 0.000002 0.024894 3.914649 ENSG00000203811
1 5888 232 125 A_23_P428298 A_23_P428298 False NM_173561 NM_173561 222643.0 UNC5CL ... GGGGATATTTTCCCCATGGATCAAGATCCAGTTTAGGGTTGGGAAA... NaN 5888 -0.602074 0.081130 -4.989385 0.000005 0.045422 3.420390 ENSG00000124602
2 9982 208 97 A_24_P159434 A_24_P159434 False NM_007261 NM_007261 11314.0 CD300A ... AGTTTCTCTGGACTCTTAGGTTTATTTTTAATATGAAATATAAAAA... NaN 9982 0.463349 -0.037320 4.850629 0.000008 0.048363 3.111913 ENSG00000167851
3 19016 155 49 A_23_P87678 A_23_P87678 False NM_004950 NM_004950 1833.0 EPYC ... GGATTGATCTGACATCAAATTTAATATCTGAGATTGATGAAGATGC... NaN 19016 0.281188 0.000073 4.846602 0.000009 0.048363 3.102991 ENSG00000083782

4 rows × 29 columns

**step 73:**Unfortunately, these four genes are not included in the AOP network and thus can’t be mapped.

5.4 MA exposure time 2

step 74: You first import the expression table into a new dataframe.

MA_24h= pd.read_csv('adaptedMA24h.tsv',sep='\t')
MA_24h

ID COL ROW NAME SPOT_ID CONTROL_TYPE REFSEQ GB_ACC GENE GENE_SYMBOL ... SEQUENCE SPOT_ID.1 ORDER logFC AveExpr t P.Value padj B ENSEMBLE_GENE_ID
0 21386 141 69 A_23_P163402 A_23_P163402 False NM_000499 NM_000499 1543.0 CYP1A1 ... GGTAAAACAGGGCCACATAGATGCTGATGGAGCCTTCCCAAGTTGT... NaN 21386 -0.821543 0.184573 -15.613779 2.421857e-23 5.450995e-19 41.618011 ENSG00000140465
1 16486 170 9 A_23_P257803 A_23_P257803 False NM_013391 NM_013391 29958.0 DMGDH ... TGGTATTGACCGAACCAACCAGAAACCGGCTTCAGAAAAAAGGTGG... NaN 16486 -0.876482 0.050920 -14.794873 3.529727e-22 5.296355e-18 39.338343 ENSG00000132837
2 20269 147 94 A_23_P435029 A_23_P435029 False NaN BC015544 NaN H3C14 ... CATCACAGTTGACAGGTTAAAAGCATTCACTGCAGCGATCTATGAG... NaN 20269 -1.086935 0.018465 -13.341288 5.056895e-20 5.690903e-16 35.039718 ENSG00000203811
3 16942 167 117 A_23_P1676 A_23_P1676 False NM_001080546 NM_001080546 219854.0 TMEM218 ... TACCCGTACCTTAGGATTTCCAACTGTTTTGAAAGGGAAATAGTAA... NaN 16942 0.455706 -0.067257 12.524192 9.277620e-19 8.352641e-15 32.479715 ENSG00000150433
4 25809 115 63 A_23_P147950 A_23_P147950 False NM_152419 NM_152419 138050.0 HGSNAT ... TCTTTGGAACTTCATTCCGAGGAGATAAGCTTTAACTTTCCAAAAG... NaN 25809 0.471999 -0.082378 12.106840 4.236592e-18 3.178503e-14 31.132408 ENSG00000165102
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2541 15620 175 41 A_24_P331704 A_24_P331704 False NM_182507 NM_182507 144501.0 KRT80 ... CCAAGGGAGCAAATCCTCAGTGGGGATACAAGACATATAAAGTATA... NaN 15620 -0.301665 0.012832 -2.916673 4.895606e-03 4.981367e-02 -0.681984 ENSG00000167767
2542 26372 111 128 A_23_P351724 A_23_P351724 False NM_022648 NM_022648 7145.0 TNS1 ... CTCTAAGCCAGAATGGAAAATTCACCAGGACTCCATTCTTAAGCCT... NaN 26372 0.312052 -0.254361 2.916410 4.899232e-03 4.983931e-02 -0.682639 ENSG00000079308
2543 40411 29 99 A_23_P369701 A_23_P369701 False NM_021214 NM_021214 58489.0 ABHD17C ... ATTACTAGCCAACAGAGTTTTACTATTTTGATTGTCTGGTTGGTTT... NaN 40411 0.200178 0.007205 2.915475 4.912166e-03 4.993703e-02 -0.684974 ENSG00000136379
2544 33634 69 53 A_23_P62115 A_23_P62115 False NM_003254 NM_003254 7076.0 TIMP1 ... CATGGAGAGTGTCTGCGGATACTTCCACAGGTCCCACAACCGCAGC... NaN 33634 -0.221966 0.073864 -2.915367 4.913662e-03 4.994095e-02 -0.685243 ENSG00000102265
2545 29769 91 134 A_23_P133279 A_23_P133279 False NM_199133 NM_199133 134145.0 ATPSCKMT ... CTTGAGAGCTGCCACTCATTTAATATTTCTCATTTATGAGAAGAGA... NaN 29769 0.147226 -0.003311 2.915143 4.916752e-03 4.996108e-02 -0.685800 ENSG00000150756

2546 rows × 29 columns

step 75: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

MA24h= pd.read_csv('adaptedMA24h.tsv',sep='\t')
MA24h_version1= MA24h[MA24h['padj'] < 0.05]
MA24h_version2=MA24h_version1.drop('ID', axis=1)
MA24h_version3= MA24h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
MA24h_version3.to_excel('MA24h-adjusted.xlsx',index=False)
MA24h_version3

GENE_SYMBOL padj t B logFC SPOT_ID
0 CYP1A1 5.450995e-19 -15.613779 41.618011 -0.821543 A_23_P163402
1 DMGDH 5.296355e-18 -14.794873 39.338343 -0.876482 A_23_P257803
2 H3C14 5.690903e-16 -13.341288 35.039718 -1.086935 A_23_P435029
3 TMEM218 8.352641e-15 12.524192 32.479715 0.455706 A_23_P1676
4 HGSNAT 3.178503e-14 12.106840 31.132408 0.471999 A_23_P147950
... ... ... ... ... ... ...
2541 KRT80 4.981367e-02 -2.916673 -0.681984 -0.301665 A_24_P331704
2542 TNS1 4.983931e-02 2.916410 -0.682639 0.312052 A_23_P351724
2543 ABHD17C 4.993703e-02 2.915475 -0.684974 0.200178 A_23_P369701
2544 TIMP1 4.994095e-02 -2.915367 -0.685243 -0.221966 A_23_P62115
2545 ATPSCKMT 4.996108e-02 2.915143 -0.685800 0.147226 A_23_P133279

2546 rows × 6 columns

step 76: You now set the working directory to the clone of the network

p4c.set_current_network('clone-GSE69844')
{}

step 77: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('MA24h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID',network='clone-GSE69844')
{'mappedTables': [235742, 235780]}

step 78: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')

step 79: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 80: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')
''

5.5 CP exposure time 1

step 81: You first import the expression table into a new dataframe.

CP_10h= pd.read_csv('adaptedCP10h.tsv',sep='\t')
CP_10h

ID COL ROW NAME SPOT_ID CONTROL_TYPE REFSEQ GB_ACC GENE GENE_SYMBOL ... SEQUENCE SPOT_ID.1 ORDER logFC AveExpr t P.Value padj B ENSEMBLE_GENE_ID
0 26605 110 2 A_23_P216225 A_23_P216225 False NM_004430 NM_004430 1960 EGR3 ... GGTTGTGAATTTCCAGGTACTTGGACTTTTTGTAGAAGTAGAGAGA... NaN 26605 0.863408 0.094340 7.955544 4.246899e-11 0.000002 10.081124 ENSG00000179388
1 41618 22 65 A_23_P212639 A_23_P212639 False NM_004593 NM_004593 6434 TRA2A ... GCATTTGTGTAGTTTGGTGCTTTGTTCCAAGTTAAGTGTTTTCAGA... NaN 41618 0.232531 0.026262 6.309106 3.135061e-08 0.000470 6.437660 ENSG00000164548
2 23385 129 151 A_24_P416370 A_24_P416370 False NM_024015 NM_024015 3214 HOXB4 ... CAGCAGAAGCCTCTCTCCTAGACTGAAAATGAATGTGAAACTAGGA... NaN 23385 -0.241820 -0.033594 -5.680978 3.665825e-07 0.002750 5.006423 ENSG00000182742
3 12053 196 35 A_23_P79155 A_23_P79155 False NM_001508 NM_001508 2863 GPR39 ... TGGAAGAACAATGCAGGAGGGGGTGGCATCTCCTTCAGCTTCAGCA... NaN 12053 -0.201463 -0.027988 -5.199354 2.302939e-06 0.010886 3.913296 ENSG00000183840
4 2479 252 143 A_23_P106194 A_23_P106194 False NM_005252 NM_005252 2353 FOS ... AGAGGGTTCCTGTAGACCTAGGGAGGACCTTATCTGTGCGTGAAAC... NaN 2479 0.652862 0.097279 5.152892 2.742032e-06 0.011221 3.808527 ENSG00000170345
5 5075 237 51 A_23_P143143 A_23_P143143 False NM_002166 NM_002166 3398 ID2 ... AGGCTTCTGAATTCCCTTCTGAGTTAATGTCAAATGACAGCAAAGC... NaN 5075 0.631542 0.093306 5.023820 4.439835e-06 0.014181 3.518391 ENSG00000115738
6 18707 156 158 A_23_P39704 A_23_P39704 False NM_001031684 NM_001031684 6432 SRSF7 ... CTCTCTTCGTAGATCAAGATCAGCTTCACTCAGAAGATCTAGGTCT... NaN 18707 0.196261 0.014785 4.813274 9.650441e-06 0.024134 3.048559 ENSG00000115875
7 1390 258 112 A_23_P131846 A_23_P131846 False NM_005985 NM_005985 6615 SNAI1 ... AACAATGTCTGAAAAGGGACTGTGAGTAATGGCTGTCACTTGTCGG... NaN 1390 0.515691 0.161391 4.589456 2.171080e-05 0.044439 2.554946 ENSG00000124216
8 16555 169 42 A_24_P401615 A_24_P401615 False NM_001039361 NM_001039361 343071 PRAMEF10 ... TTACCTGAGCCAGATGAGCAATCTTCGTGAACTCTTTTTAGCCTTC... NaN 16555 0.595546 0.072522 4.589357 2.171847e-05 0.044439 2.554731 ENSG00000187545

9 rows × 29 columns

step 82: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

CP10h= pd.read_csv('adaptedCP10h.tsv',sep='\t')
CP10h_version1= CP10h[CP10h['padj'] < 0.05]
CP10h_version2=CP10h_version1.drop('ID', axis=1)
CP10h_version3= CP10h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
CP10h_version3.to_excel('CP10h-adjusted.xlsx',index=False)
CP10h_version3

GENE_SYMBOL padj t B logFC SPOT_ID
0 EGR3 0.000002 7.955544 10.081124 0.863408 A_23_P216225
1 TRA2A 0.000470 6.309106 6.437660 0.232531 A_23_P212639
2 HOXB4 0.002750 -5.680978 5.006423 -0.241820 A_24_P416370
3 GPR39 0.010886 -5.199354 3.913296 -0.201463 A_23_P79155
4 FOS 0.011221 5.152892 3.808527 0.652862 A_23_P106194
5 ID2 0.014181 5.023820 3.518391 0.631542 A_23_P143143
6 SRSF7 0.024134 4.813274 3.048559 0.196261 A_23_P39704
7 SNAI1 0.044439 4.589456 2.554946 0.515691 A_23_P131846
8 PRAMEF10 0.044439 4.589357 2.554731 0.595546 A_24_P401615

step 83: You now set the working directory to the clone of the network

p4c.set_current_network('clone-GSE69844')
{}

step 84: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('CP10h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID',network='clone-GSE69844')
{'mappedTables': [235742, 235780]}

step 85: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')

step 86: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 87: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')
''

5.6 CP exposure time 2

step 88: You first import the expression table into a new dataframe.

CP_24h= pd.read_csv('adaptedCP24h.tsv',sep='\t')
CP_24h

ID COL ROW NAME SPOT_ID CONTROL_TYPE REFSEQ GB_ACC GENE GENE_SYMBOL ... SEQUENCE SPOT_ID.1 ORDER logFC AveExpr t P.Value padj B ENSEMBLE_GENE_ID
0 2985 249 151 A_24_P270728 A_24_P270728 False NM_001042483 NM_001042483 26471.0 NUPR1 ... TATTCCCGCTGACTGAGTCTCTGAGGGGCTACCAGGAAAGCGCCTC... NaN 2985 0.569941 0.167603 10.048946 1.026382e-14 4.620257e-10 22.754357 ENSG00000176046
1 26605 110 2 A_23_P216225 A_23_P216225 False NM_004430 NM_004430 1960.0 EGR3 ... GGTTGTGAATTTCCAGGTACTTGGACTTTTTGTAGAAGTAGAGAGA... NaN 26605 1.036190 0.094340 9.547580 7.333753e-14 1.650645e-09 21.111511 ENSG00000179388
2 20886 144 49 A_23_P1691 A_23_P1691 False NM_002421 NM_002421 4312.0 MMP1 ... ACATGTGCAGTCACTGGTGTCACCCTGGATAGGCAAGGGATAACTC... NaN 20886 -1.110417 0.336235 -9.060676 5.051693e-13 7.580066e-09 19.484600 ENSG00000196611
3 38540 40 101 A_23_P1691 A_23_P1691 False NM_002421 NM_002421 4312.0 MMP1 ... ACATGTGCAGTCACTGGTGTCACCCTGGATAGGCAAGGGATAACTC... NaN 38540 -1.050224 0.329034 -8.267679 1.208709e-11 8.352296e-08 16.778923 ENSG00000196611
4 33376 70 60 A_24_P85300 A_24_P85300 False NM_020733 NM_020733 57493.0 HEG1 ... AGGATGAGCGTACCACTGAAGTCTGAAGATGTCGCCATTGAACGGA... NaN 33376 0.404108 -0.001845 8.218781 1.471445e-11 8.352296e-08 16.610201 ENSG00000173706
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
671 17270 165 141 A_32_P800179 A_32_P800179 False NaN AK094933 NaN SLC30A6 ... CTGTGTGTTAAAAGCATTGTATACGTGAAAAAGGACTCAAACTCAT... NaN 17270 -0.295208 -0.018674 -3.350876 1.364742e-03 4.938413e-02 0.583949 ENSG00000152683
672 20415 146 142 A_23_P162589 A_23_P162589 False NM_001017535 NM_001017535 7421.0 VDR ... CAAGCGAGGTCAACAGAGAAGGCAGGAATGTGTGGCAGATTTAGTG... NaN 20415 -0.180143 0.077814 -3.350611 1.365847e-03 4.938443e-02 0.583251 ENSG00000111424
673 16828 168 5 A_24_P4816 A_24_P4816 False NM_031412 NM_031412 23710.0 GABARAPL1 ... GGATTGGCTTTGATAGAGGAATGGGGATGATGTAAGTTTACAGTAT... NaN 16828 0.328695 0.144492 3.350035 1.368258e-03 4.943190e-02 0.581731 ENSG00000139112
674 7523 222 86 A_23_P18413 A_23_P18413 False NM_016589 NM_016589 51300.0 TIMMDC1 ... TGCTGACAAATTTAAGTGCTGGTACCTGTGGTGGCAGTGGCTTGCT... NaN 7523 -0.104950 0.041448 -3.348029 1.376679e-03 4.965340e-02 0.576441 ENSG00000113845
675 43883 8 126 A_23_P366983 A_23_P366983 False NM_013381 NM_013381 29953.0 TRHDE ... AGTTACCACATATTCACGTTTATAAAATCCTTAATTAAATGAGTAA... NaN 43883 0.170322 -0.008790 3.347479 1.378996e-03 4.966040e-02 0.574992 ENSG00000072657

676 rows × 29 columns

step 89: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.

CP24h= pd.read_csv('adaptedCP10h.tsv',sep='\t')
CP24h_version1= CP24h[CP24h['padj'] < 0.05]
CP24h_version2=CP24h_version1.drop('ID', axis=1)
CP24h_version3= CP24h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
CP24h_version3.to_excel('CP24h-adjusted.xlsx',index=False)
CP24h_version3

GENE_SYMBOL padj t B logFC SPOT_ID
0 EGR3 0.000002 7.955544 10.081124 0.863408 A_23_P216225
1 TRA2A 0.000470 6.309106 6.437660 0.232531 A_23_P212639
2 HOXB4 0.002750 -5.680978 5.006423 -0.241820 A_24_P416370
3 GPR39 0.010886 -5.199354 3.913296 -0.201463 A_23_P79155
4 FOS 0.011221 5.152892 3.808527 0.652862 A_23_P106194
5 ID2 0.014181 5.023820 3.518391 0.631542 A_23_P143143
6 SRSF7 0.024134 4.813274 3.048559 0.196261 A_23_P39704
7 SNAI1 0.044439 4.589456 2.554946 0.515691 A_23_P131846
8 PRAMEF10 0.044439 4.589357 2.554731 0.595546 A_24_P401615

step 90: You now set the working directory to the clone of the network

p4c.set_current_network('clone-GSE69844')
{}

step 91: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.

p4c.load_table_data_from_file('CP24h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID',network='clone-GSE69844')
{'mappedTables': [235742, 235780]}

step 92: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.

Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')

step 93: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:

  • Low expression value (minimum) = blue node color
  • No expression value = white node color
  • High expression value (maximum) = red node color

This color scheme was also described in the official py4cytoscape documentation (1).

Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2

step 94: You apply this color scheme to the network.

p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')
''

Section 6: Metadata

step 95. At last, the metadata belonging to this jupyternotebook is displayed which contains the version numbers of packages and system-set-up for interested users. This requires the usage of packages:Watermark and print_versions.

%load_ext watermark
!pip install print-versions
Requirement already satisfied: print-versions in c:\users\shaki\anaconda3\lib\site-packages (0.1.0)
%watermark
Last updated: 2025-06-03T17:26:16.367030+02:00

Python implementation: CPython
Python version       : 3.12.3
IPython version      : 8.25.0

Compiler    : MSC v.1938 64 bit (AMD64)
OS          : Windows
Release     : 11
Machine     : AMD64
Processor   : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
CPU cores   : 8
Architecture: 64bit
from print_versions import print_versions
print_versions(globals())
json==2.0.9
ipykernel==6.28.0
numpy==1.26.4
pandas==2.2.2
ipywidgets==8.0.3
xarray==2023.6.0
py4cytoscape==1.9.0

Reference:

  1. Basic Data Visualization — py4cytoscape 0.0.5 documentation \[Internet\]. Readthedocs.io. 2021 \[cited 2025 Feb 26\]. Available from: https://py4cytoscape.readthedocs.io/en/0.0.5/tutorials/basic-data-visualization.html