Part 5: Visualization of transcriptomics expression datasets in the enriched AOP network part 2
The AOP project ► Key objective 2
Author: Shakira Agata
This Jupyter notebook describes the steps needed for the mapping of transcriptomics datasets in the constructed enriched AOP network. For this notebook, open license transcriptomics datasets were derived from ArrayExpress and Gene Expression Omnibus (GEO). These datasets were preprocessed followed by execution of statistical analysis to identify differential expression genes (DEG). The tabulation of differential gene expression data was subsequently mapped/integrated into the network. This notebook is subdivided into the following six sections:
- Section 1: System preparation
- Section 2: Retrieval of the enriched AOP network
- Section 3: Adaptation of gene node color within the enriched AOP network
- Section 4: Mapping of dataset: E-GEOD-69851
- Section 4.1 Bisphenol A
- Section 4.2 Farnesol
- Section 4.3 Tetrachlorodibenzo p-dioxin
- Section 4.4 Troglitazone
- Section 4.5 Valproic acid
- Section 5: Mapping of dataset: E-GEOD-69851
- Section 5.1 ACR exposure time 1
- Section 5.2 ACR exposure time 2
- Section 5.3 MA exposure time 1
- Section 5.4 MA exposure time 2
- Section 5.5 CP exposure time 1
- Section 5.6 CP exposure time 2
- Section 6: Metadata
Section 1: System preparation
In this section, you will import the required packages and tools you need for this Jupyternotebook.
step 1: You imported Pandas, Py4cytoscape and style mapping functions of Py4cytoscape.
import pandas as pd
import glob
import py4cytoscape as p4c
p4c.cytoscape_ping()
p4c.cytoscape_version_info()
You are connected to Cytoscape!
{'apiVersion': 'v1',
'cytoscapeVersion': '3.10.1',
'automationAPIVersion': '1.9.0',
'py4cytoscapeVersion': '1.9.0'}
from py4cytoscape import get_node_color
from py4cytoscape import set_node_color_mapping
from py4cytoscape import gen_node_color_map
from py4cytoscape import set_edge_color_default
from py4cytoscape import set_node_color_default
from py4cytoscape import set_edge_source_arrow_shape_default
from py4cytoscape import set_edge_target_arrow_shape_default
from py4cytoscape import get_arrow_shapes
from py4cytoscape import get_edge_target_arrow_shape
from py4cytoscape import set_edge_target_arrow_shape_mapping
from py4cytoscape import gen_edge_arrow_map
from py4cytoscape import select_nodes
from py4cytoscape import get_table_value
from py4cytoscape import get_network_suid
from py4cytoscape import clear_selection
from py4cytoscape import set_node_color_bypass
from py4cytoscape import set_edge_color_bypass
from py4cytoscape import set_edge_target_arrow_color_default
from py4cytoscape import set_node_size_bypass
from py4cytoscape import create_subnetwork
Section 2: Retrieval of molecular inflammation-process related AOP network
In this section, you will change the node color of genes and adapt the style for easier intepretation of the upcoming results. This is needed in preparation for the mapping of transcriptomics datasets. These datasets may contain genes that are not present in the build AOP network and so therefore should receive a distinct color to correctly inform user.
step 2: You open the session you saved in the previous Jupyternotebook.
p4c.open_session('Agata,S.-Part4-Complete Molecular inflammation-process related AOP network.cys')
Opening C:\Users\shaki\Downloads\Agata,S.-Part4-Complete Molecular inflammation-process related AOP network.cys...
{}
Section 3: Adaptation of gene node color within the enriched AOP network
In this section, you will change the node color of genes and adapt the style for easier intepretation of the upcoming results. This is needed in preparation for the mapping of transcriptomics datasets. These datasets may contain genes that are not present in the build AOP network and so therefore should receive a distinct color to correctly inform user.
step 3: You can change the style with the following commands.
style_name = "default"
defaults = {'NODE_SHAPE': "ELLIPSE", 'NODE_SIZE': 20, 'EDGE_TRANSPARENCY': 140, 'NODE_LABEL_POSITION': "C,C,c,0.00,0.00"}
nodeLabels = p4c.map_visual_property('node label', 'name', 'p')
edgeWidth = p4c.map_visual_property('edge width', 'weight', 'p')
arrowShapes = p4c.map_visual_property('Edge Target Arrow Shape','interaction', 'd')
p4c.create_visual_style(style_name, defaults, [nodeLabels, edgeWidth])
p4c.set_visual_style(style_name)
{'message': 'Visual Style applied.'}
set_node_color_default('#a7a5a5',style_name='default')
set_edge_color_default('#01e735', style_name='default')
''
p4c.clone_network()
p4c.rename_network('clone-GSE69844')
{'network': 235771, 'title': 'clone-GSE69844'}
Section 4: Mapping of dataset:GSE69844
In this section, you will map the transcriptomics expression data of dataset:GSE69844.This will be done in similar fashion as in previous section, but will be streamlined due to high number of datafiles. In preparation for this section, you must first download the datafiles of the chemicals into separate folders.
4.1 Bisphenol A
4.1.1 Concentration 1
step 4: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.
BisphenolA_concentration1= pd.read_csv('GSE69844.BisphenolA-1uM.tsv',sep='\t')
Adjusted_BisphenolA_concentration1= BisphenolA_concentration1[BisphenolA_concentration1['padj'] < 0.05]
BisphenolA_concentration_1=Adjusted_BisphenolA_concentration1.drop('ID', axis=1)
BisphenolA_Concentration1 = BisphenolA_concentration_1[['Entrez.Gene','Gene.Symbol','padj', 't', 'B','logFC','GB_LIST','SPOT_ID']]
BisphenolA_Concentration1.to_excel('GSE69844.BisphenolA-concentration1.xlsx',index=False)
BisphenolA_Concentration1
Entrez.Gene | Gene.Symbol | padj | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|---|
0 | 28996 | HIPK2 | 0.000026 | -12.821942 | 12.209372 | -0.593550 | NM_001113239,NM_022740,XM_001716827,XM_925800 | NaN |
1 | 87 | ACTN1 | 0.000026 | -12.667672 | 12.062264 | -0.585774 | NM_001102,NM_001130004,NM_001130005 | NaN |
2 | 1455 | CSNK1G2 | 0.000029 | -12.228128 | 11.631158 | -0.489035 | NM_001319 | NaN |
3 | 2316 | FLNA | 0.000038 | -11.657262 | 11.043619 | -0.444861 | NM_001110556,NM_001456 | NaN |
4 | 23524 | SRRM2 | 0.000038 | -11.409780 | 10.778745 | -0.554962 | NM_016333 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... |
3472 | 7485 | WRB | 0.049943 | 3.424377 | -1.958139 | 0.136566 | NM_001146218,NM_004627 | NaN |
3473 | 81607 | PVRL4 | 0.049943 | -3.424351 | -1.958192 | -0.152348 | NM_030916 | NaN |
3474 | 91057 | CCDC34 | 0.049943 | 3.424349 | -1.958195 | 0.170781 | NM_030771,NM_080654 | NaN |
3475 | 79780 | CCDC82 | 0.049976 | 3.423893 | -1.959112 | 0.138349 | NM_024725 | NaN |
3476 | 2108 | ETFA | 0.049982 | 3.423700 | -1.959501 | 0.142652 | NM_000126,NM_001127716 | NaN |
3477 rows × 8 columns
**step 5: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.BisphenolA-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID',network='Agata,S.-Part4-Molecular inflammation-process related AOP network')
{'mappedTables': [388893, 388931]}
step 6: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
**step 7: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 8: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.1.2 Concentration 2
step 9: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.
BisphenolA_concentration2= pd.read_csv('GSE69844.BisphenolAConcentration2.tsv',sep='\t')
Adjusted_BisphenolA_concentration2= BisphenolA_concentration2[BisphenolA_concentration2['adj.P.Val'] < 0.05]
BisphenolA_concentration_2=Adjusted_BisphenolA_concentration2.drop('ID', axis=1)
BisphenolA_Concentration2 = BisphenolA_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
BisphenolA_Concentration2.to_excel('GSE69844.BisphenolA-concentration2.xlsx',index=False)
BisphenolA_Concentration2
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | COL8A1 | 0.0172 | -9.304198 | 3.89521 | -0.381 | NM_001850,NM_020351 | NaN |
1 | S100A2 | 0.0414 | 8.109804 | 3.14555 | 0.330 | NM_005978 | NaN |
step 10: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.BisphenolA-concentration2.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID')
{'mappedTables': [388893, 388931]}
step 11: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
step 12: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 13: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.1.3 Concentration 3
step 14: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.
BisphenolA_concentration3= pd.read_csv('GSE69844.BisphenolAConcentration3.tsv',sep='\t')
Adjusted_BisphenolA_concentration3= BisphenolA_concentration3[BisphenolA_concentration3['adj.P.Val'] < 0.05]
BisphenolA_concentration_3=Adjusted_BisphenolA_concentration3.drop('ID', axis=1)
BisphenolA_Concentration3 = BisphenolA_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
BisphenolA_Concentration3.to_excel('GSE69844.BisphenolA-concentration3.xlsx',index=False)
BisphenolA_Concentration3
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | COL8A1 | 0.000002 | -15.333380 | 15.028626 | -0.649 | NM_001850,NM_020351 | NaN |
1 | ZBTB16 | 0.000003 | -14.358816 | 14.190854 | -0.754 | NM_001018011,NM_006006 | NaN |
2 | PDE4DIP | 0.000003 | -14.169566 | 14.020061 | -0.640 | NM_001002810,NM_001002811,NM_001002812,NM_0146... | NaN |
3 | GGT5 | 0.000003 | -13.645523 | 13.532532 | -0.596 | NM_001099781,NM_001099782,NM_004121 | NaN |
4 | ZFP36L2 | 0.000003 | -13.583284 | 13.473166 | -0.520 | NM_006887 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
5726 | PSAT1 | 0.049904 | 3.176887 | -2.554956 | 0.136 | NM_021154,NM_058179 | NaN |
5727 | PXN | 0.049921 | 3.176598 | -2.555540 | 0.170 | NM_001080855,NM_002859,NM_025157 | NaN |
5728 | SET | 0.049921 | -3.176562 | -2.555612 | -0.125 | NM_001122821,NM_003011 | NaN |
5729 | SIGMAR1 | 0.049943 | -3.176276 | -2.556190 | -0.153 | NM_005866,NM_147157 | NaN |
5730 | SEMA3B | 0.049965 | -3.175985 | -2.556778 | -0.133 | NM_001005914,NM_004636 | NaN |
5731 rows × 7 columns
step 15: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.BisphenolA-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
step 16: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
step 17: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 18: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
4.2 Farnesol
3.2.1 Concentration 1
step 19: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 1 of Farnesol does not yield significant genes and thus can’t be mapped in the AOP network.
Farnesol_concentration1= pd.read_csv('GSE69844.FarnesolConcentration1.tsv',sep='\t')
Adjusted_Farnesol_concentration1= Farnesol_concentration1[Farnesol_concentration1['adj.P.Val'] < 0.05]
Farnesol_concentration_1=Adjusted_Farnesol_concentration1.drop('ID', axis=1)
Farnesol_Concentration1 = Farnesol_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Farnesol_Concentration1.to_excel('GSE69844.Farnesol-concentration1.xlsx',index=False)
Farnesol_Concentration1
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID |
---|
3.2.2 Concentration 2
step 20: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 2 of Farnesol does not yield significant genes and thus can’t be mapped in the AOP network.
Farnesol_concentration2= pd.read_csv('GSE69844.FarnesolConcentration2.tsv',sep='\t')
Adjusted_Farnesol_concentration2= Farnesol_concentration2[Farnesol_concentration2['adj.P.Val'] < 0.05]
Farnesol_concentration_2=Adjusted_Farnesol_concentration2.drop('ID', axis=1)
Farnesol_Concentration2 = Farnesol_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Farnesol_Concentration2.to_excel('GSE69844.Farnesol-concentration2.xlsx',index=False)
Farnesol_Concentration2
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID |
---|
3.2.3 Concentration 3
step 21: You first import the expression table into a new dataframe.
Farnesol_concentration3= pd.read_csv('GSE69844.FarnesolConcentration3.tsv',sep='\t')
Adjusted_Farnesol_concentration3= Farnesol_concentration3[Farnesol_concentration3['adj.P.Val'] < 0.05]
Farnesol_concentration_3=Adjusted_Farnesol_concentration3.drop('ID', axis=1)
Farnesol_Concentration3 = Farnesol_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Farnesol_Concentration3.to_excel('GSE69844.Farnesol-concentration3.xlsx',index=False)
Farnesol_Concentration3
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | IGFBP1 | 0.00101 | 13.437237 | 6.97653 | 0.709 | NM_000596 | NaN |
1 | CBX5 | 0.01410 | -9.847783 | 5.16687 | -0.627 | NM_001127321,NM_001127322,NM_012117 | NaN |
2 | --- | 0.01543 | -9.388084 | 4.85421 | -0.569 | NaN | --AFFX-HUMRGE/M10098_3 |
3 | ANGPTL4 | 0.03269 | 8.326084 | 4.03872 | 0.701 | NM_001039667,NM_139314 | NaN |
4 | FNDC3B | 0.03269 | -8.294095 | 4.01196 | -0.439 | NM_001135095,NM_022763 | NaN |
step 22: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.Farnesol-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
step 23: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
Log2Foldchange_column
logFC | |
---|---|
389124 | NaN |
401415 | NaN |
393220 | NaN |
401410 | 0.153 |
389120 | NaN |
... | ... |
401405 | 0.158 |
389112 | NaN |
401400 | NaN |
393210 | 0.370 |
397305 | NaN |
2952 rows × 1 columns
step 24: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 25: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
4.4 Tetrachlorodibenzo p-dioxin
3.4.1 Concentration 1
step 26: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.
Tpdioxin_concentration1= pd.read_csv('GSE69844.TpdioxinConcentration1.tsv',sep='\t')
Adjusted_Tpdioxin_concentration1= Tpdioxin_concentration1[Tpdioxin_concentration1['adj.P.Val'] < 0.05]
Tpdioxin_concentration_1=Adjusted_Tpdioxin_concentration1.drop('ID', axis=1)
Tpdioxin_Concentration1 = Tpdioxin_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Tpdioxin_Concentration1.to_excel('GSE69844.Tpdioxin-concentration1.xlsx',index=False)
Tpdioxin_Concentration1
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | CYP1B1 | 0.000628 | 54.309631 | 3.33321 | 3.630 | NM_000104 | NaN |
1 | CYP1B1 | 0.002197 | 37.892762 | 3.22492 | 3.550 | NM_000104 | NaN |
2 | CYP1B1 | 0.002271 | 34.935858 | 3.18861 | 3.270 | NM_000104 | NaN |
3 | IER3 | 0.003985 | 29.839182 | 3.10077 | 1.430 | NM_003897 | NaN |
4 | CYP1A1 | 0.004145 | 27.446373 | 3.04317 | 3.280 | NM_000499 | NaN |
5 | CYP1B1 | 0.004145 | 27.033615 | 3.03179 | 3.710 | NM_000104 | NaN |
6 | CYP1A1 | 0.004145 | 26.696606 | 3.02214 | 3.190 | NM_000499 | NaN |
7 | HSD17B2 | 0.017668 | 19.876908 | 2.72569 | 0.923 | NM_002153 | NaN |
8 | SLC7A11 | 0.041721 | 16.244113 | 2.42797 | 0.782 | NM_014331 | NaN |
9 | GDF15 /// LOC100292463 | 0.041721 | 16.141041 | 2.41710 | 0.890 | NM_004864,XM_002345162 | NaN |
10 | TIPARP | 0.041721 | 15.940321 | 2.39544 | 0.708 | NM_001184717,NM_001184718,NM_015508 | NaN |
step 27: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.Tpdioxin-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
step 28: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
step 29: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 30: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.4.2 Concentration 2
step 31: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.
Tpdioxin_concentration2= pd.read_csv('GSE69844.TpdioxinConcentration2.tsv',sep='\t')
Adjusted_Tpdioxin_concentration2= Tpdioxin_concentration2[Tpdioxin_concentration2['adj.P.Val'] < 0.05]
Tpdioxin_concentration_2=Adjusted_Tpdioxin_concentration2.drop('ID', axis=1)
Tpdioxin_Concentration2 = Tpdioxin_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Tpdioxin_Concentration2.to_excel('GSE69844.Tpdioxin-concentration2.xlsx',index=False)
Tpdioxin_Concentration2
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | CYP1B1 | 0.00381 | 34.144660 | -1.85 | 3.223325 | NM_000104 | NaN |
1 | CYP1B1 | 0.00381 | 32.206372 | -1.86 | 3.642847 | NM_000104 | NaN |
2 | CYP1B1 | 0.00978 | 25.244475 | -1.87 | 3.359260 | NM_000104 | NaN |
step 32: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.Tpdioxin-concentration2.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
step 33: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
step 34: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 35: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.4.3 Concentration 3
step 36: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.
Tpdioxin_concentration3= pd.read_csv('GSE69844.TpdioxinConcentration3.tsv',sep='\t')
Adjusted_Tpdioxin_concentration3= Tpdioxin_concentration3[Tpdioxin_concentration3['adj.P.Val'] < 0.05]
Tpdioxin_concentration_3=Adjusted_Tpdioxin_concentration3.drop('ID', axis=1)
Tpdioxin_Concentration3 = Tpdioxin_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Tpdioxin_Concentration3.to_excel('GSE69844.Tpdioxin-concentration3.xlsx',index=False)
Tpdioxin_Concentration3
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | CYP1B1 | 0.000035 | 66.355926 | 7.262273 | 3.951611 | NM_000104 | NaN |
1 | CYP1B1 | 0.000050 | 55.772416 | 7.162770 | 3.734094 | NM_000104 | NaN |
2 | CYP1A1 | 0.000060 | 50.590763 | 7.091598 | 3.380474 | NM_000499 | NaN |
3 | CYP1B1 | 0.000069 | 47.138520 | 7.031778 | 3.502514 | NM_000104 | NaN |
4 | SERPINB2 | 0.000145 | 40.135573 | 6.865297 | 3.556188 | NM_001143818,NM_002575 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
145 | MPHOSPH6 | 0.049924 | 8.406946 | 1.666651 | 0.409146 | NM_005792 | NaN |
146 | RUNX1 | 0.049924 | 8.402542 | 1.663899 | 0.420556 | NM_001001890,NM_001122607,NM_001754 | NaN |
147 | A1CF | 0.049924 | -8.400467 | 1.662601 | -0.490591 | NM_014576,NM_138932,NM_138933 | NaN |
148 | ABLIM1 | 0.049924 | -8.394971 | 1.659163 | -0.487655 | NM_001003407,NM_001003408,NM_002313,NM_006720 | NaN |
149 | GNA13 | 0.049924 | 8.386056 | 1.653580 | 0.544947 | NM_006572 | NaN |
150 rows × 7 columns
step 37: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.Tpdioxin-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
step 38: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
step 39: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 40: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
4.5 Valproic acid
3.5.1 Concentration 1
step 41: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 1 of Valproic acid does not yield significant genes and thus can’t be mapped in the AOP network.
Valproicacid_concentration1= pd.read_csv('GSE69844.ValproicacidConcentration1.tsv',sep='\t')
Adjusted_Valproicacid_concentration1= Valproicacid_concentration1[Valproicacid_concentration1['adj.P.Val'] < 0.05]
Valproicacid_concentration_1=Adjusted_Valproicacid_concentration1.drop('ID', axis=1)
Valproicacid_Concentration1 = Valproicacid_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Valproicacid_Concentration1.to_excel('GSE69844.Valproicacid-concentration1.xlsx',index=False)
Valproicacid_Concentration1
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | CXorf26 | 0.00223 | -28.40 | 8.250030 | -1.090 | NM_016500 | NaN |
1 | ONECUT2 | 0.00223 | 26.20 | 7.967265 | 1.310 | NM_004852 | NaN |
2 | KEAP1 | 0.00275 | -22.30 | 7.312031 | -1.190 | NM_012289,NM_203500 | NaN |
3 | ITPRIPL2 | 0.00275 | -22.30 | 7.302028 | -0.990 | NM_001034841,NR_028028 | NaN |
4 | TMEM170B | 0.00275 | 22.00 | 7.241510 | 1.470 | NM_001100829 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
4499 | MTF2 | 0.04987 | 4.26 | -2.075831 | 0.294 | NM_001164391,NM_001164392,NM_001164393,NM_007358 | NaN |
4500 | CDK5RAP1 | 0.04994 | 4.26 | -2.077623 | 0.230 | NM_016082,NM_016408 | NaN |
4501 | DPF2 | 0.04994 | -4.26 | -2.077706 | -0.233 | NM_006268 | NaN |
4502 | ROR2 | 0.04998 | 4.26 | -2.078890 | 0.217 | NM_004560 | NaN |
4503 | SLC9A8 | 0.04998 | -4.26 | -2.079096 | -0.222 | NM_015266 | NaN |
4504 rows × 7 columns
step 42: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.Valproicacid-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
step 43: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
step 44: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 45: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.5.2 Concentration 2
step 46: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 2 of Valproic acid does not yield significant genes and thus can’t be mapped in the AOP network.
Valproicacid_concentration2= pd.read_csv('GSE69844.ValproicacidConcentration2.tsv',sep='\t')
Adjusted_Valproicacid_concentration2= Valproicacid_concentration2[Valproicacid_concentration2['adj.P.Val'] < 0.05]
Valproicacid_concentration_2=Adjusted_Valproicacid_concentration2.drop('ID', axis=1)
Valproicacid_Concentration2 = Valproicacid_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Valproicacid_Concentration2.to_excel('GSE69844.Valproicacid-concentration2.xlsx',index=False)
Valproicacid_Concentration2
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID |
---|
3.5.3 Concentration 3
step 47: You first import the expression table into a new dataframe and filtered for the significant genes. It turns out that concentration 3 of Valproic acid does not yield significant genes and thus can’t be mapped in the AOP network.
Valproicacid_concentration3= pd.read_csv('GSE69844.ValproicacidConcentration3.tsv',sep='\t')
Adjusted_Valproicacid_concentration3= Valproicacid_concentration3[Valproicacid_concentration3['adj.P.Val'] < 0.05]
Valproicacid_concentration_3=Adjusted_Valproicacid_concentration3.drop('ID', axis=1)
Valproicacid_Concentration3 = Valproicacid_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Valproicacid_Concentration3.to_excel('GSE69844.Valproicacid-concentration3.xlsx',index=False)
Valproicacid_Concentration3
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID |
---|
4.7 Troglitazone
4.7.1 Concentration 1
step 48: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.
Troglitazone_concentration1= pd.read_csv('GSE69844.TroglitazoneConcentration1.tsv',sep='\t')
Adjusted_Troglitazone_concentration1= Troglitazone_concentration1[Troglitazone_concentration1['adj.P.Val'] < 0.05]
Troglitazone_concentration_1=Adjusted_Troglitazone_concentration1.drop('ID', axis=1)
Troglitazone_Concentration1 = Troglitazone_concentration_1[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Troglitazone_Concentration1.to_excel('GSE69844.Troglitazone-concentration1.xlsx',index=False)
Troglitazone_Concentration1
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | FABP4 | 2.170000e-07 | 26.70 | 14.69249 | 2.278158 | NM_001442 | NaN |
1 | CSNK1G2 | 1.510000e-04 | -14.40 | 10.27064 | -0.585872 | NM_001319 | NaN |
2 | ACTN1 | 2.410000e-04 | -13.30 | 9.59510 | -0.580898 | NM_001102,NM_001130004,NM_001130005 | NaN |
3 | COL8A1 | 3.770000e-04 | -12.50 | 9.00330 | -0.629488 | NM_001850,NM_020351 | NaN |
4 | SRRM2 | 3.990000e-04 | -12.20 | 8.77591 | -0.730824 | NM_016333 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
3118 | PHF20 | 4.980000e-02 | -3.68 | -1.74472 | -0.165499 | NM_016436 | NaN |
3119 | CTGF | 4.980000e-02 | -3.68 | -1.74515 | -0.219036 | NM_001901 | NaN |
3120 | ADNP2 | 4.980000e-02 | -3.68 | -1.74565 | -0.189624 | NM_014913 | NaN |
3121 | ATP2A2 | 4.990000e-02 | -3.68 | -1.74633 | -0.165539 | NM_001135765,NM_001681,NM_170665 | NaN |
3122 | C1orf198 | 4.990000e-02 | -3.68 | -1.74677 | -0.194524 | NM_001136494,NM_001136495,NM_032800 | NaN |
3123 rows × 7 columns
step 49: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.Troglitazone-concentration1.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
step 50: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
step 51: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 52: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.7.2 Concentration 2
step 53: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.
Troglitazone_concentration2= pd.read_csv('GSE69844.TroglitazoneConcentration2.tsv',sep='\t')
Adjusted_Troglitazone_concentration2= Troglitazone_concentration2[Troglitazone_concentration2['adj.P.Val'] < 0.05]
Troglitazone_concentration_2=Adjusted_Troglitazone_concentration2.drop('ID', axis=1)
Troglitazone_Concentration2 = Troglitazone_concentration_2[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Troglitazone_Concentration2.to_excel('GSE69844.Troglitazone-concentration2.xlsx',index=False)
Troglitazone_Concentration2
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | FABP4 | 1.000000e-12 | 57.329825 | 13.74886 | 3.351827 | NM_001442 | NaN |
1 | PLIN4 | 2.340000e-05 | 14.969008 | 9.66342 | 0.709311 | NM_001080400 | NaN |
2 | ATP2B4 | 5.990000e-05 | 13.451635 | 8.95408 | 0.550716 | NM_001001396,NM_001684 | NaN |
3 | PDK4 | 5.990000e-05 | 13.146994 | 8.79403 | 1.124497 | NM_002612 | NaN |
4 | DLC1 | 9.970000e-05 | 12.396132 | 8.37088 | 0.527070 | NM_001164271,NM_006094,NM_024767,NM_182643 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
69 | TXNIP | 4.420000e-02 | 5.718843 | 1.90694 | 0.311458 | NM_006472 | NaN |
70 | PHLDA3 | 4.420000e-02 | 5.712639 | 1.89787 | 0.227240 | NM_012396 | NaN |
71 | ATP1B1 | 4.600000e-02 | -5.680871 | 1.85131 | -0.231002 | NM_001001787,NM_001677 | NaN |
72 | ANKRD1 | 4.690000e-02 | -5.661073 | 1.82221 | -0.295793 | NM_014391 | NaN |
73 | ZBED3 | 4.920000e-02 | 5.625298 | 1.76945 | 0.301710 | NM_032367 | NaN |
74 rows × 7 columns
step 54: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.geneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.Troglitazone-concentration2.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
step 55: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
step 56: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 57: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
3.7.3 Concentration 3
step 58: You first import the expression table into a new dataframe followed by data manipulation. This is done so that the final table can be used as input for the py4cytoscape function.
Troglitazone_concentration3= pd.read_csv('GSE69844.TroglitazoneConcentration3.tsv',sep='\t')
Adjusted_Troglitazone_concentration3= Troglitazone_concentration3[Troglitazone_concentration3['adj.P.Val'] < 0.05]
Troglitazone_concentration_3=Adjusted_Troglitazone_concentration3.drop('ID', axis=1)
Troglitazone_Concentration3 = Troglitazone_concentration_3[['Gene.Symbol', 'adj.P.Val', 't', 'B','logFC','GB_LIST','SPOT_ID']]
Troglitazone_Concentration3.to_excel('GSE69844.Troglitazone-concentration3.xlsx',index=False)
Troglitazone_Concentration3
Gene.Symbol | adj.P.Val | t | B | logFC | GB_LIST | SPOT_ID | |
---|---|---|---|---|---|---|---|
0 | FABP4 | 2.190000e-17 | 63.189122 | 33.180151 | 3.760460 | NM_001442 | NaN |
1 | PDK4 | 8.910000e-12 | 27.858041 | 25.280555 | 1.669789 | NM_002612 | NaN |
2 | KLF9 | 8.910000e-12 | -27.795929 | 25.251639 | -1.271744 | NM_001206 | NaN |
3 | INSIG1 | 8.910000e-12 | 27.633020 | 25.175329 | 1.021568 | NM_005542,NM_198336,NM_198337 | NaN |
4 | CYP1B1 | 4.890000e-11 | 24.684925 | 23.668933 | 1.327687 | NM_000104 | NaN |
... | ... | ... | ... | ... | ... | ... | ... |
9171 | RPA2 | 4.990000e-02 | 2.923957 | -3.330818 | 0.110598 | NM_002946 | NaN |
9172 | INHBE | 4.990000e-02 | 2.923939 | -3.330855 | 0.214330 | NM_031479 | NaN |
9173 | C5orf41 | 4.990000e-02 | -2.923657 | -3.331433 | -0.113331 | NM_001168393,NM_001168394,NM_153607 | NaN |
9174 | C7orf68 | 4.990000e-02 | 2.923642 | -3.331465 | 0.281327 | NM_001098786,NM_013332 | NaN |
9175 | RPL24 | 4.990000e-02 | 2.923395 | -3.331971 | 0.112186 | NM_000986 | NaN |
9176 rows × 7 columns
step 59: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.geneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('GSE69844.Troglitazone-concentration3.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
step 60: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC')
step 61: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 62: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default')
''
Section 5: Mapping of dataset:GSE44729
In this section, you will map the transcriptomics expression data of datasets: GSE44729. This dataset aimed to transcriptonally profile BEAS-2B cells for the comparison between controls and skin sensitizers, controls and respiratory sensitizers and controls and non-sensitizing irritants.
5.1 ACR exposure time 1
step 63: You first import the expression table into a new dataframe.
ACR_10h= pd.read_csv('adaptedACR10h.tsv',sep='\t')
ACR_10h
ID | COL | ROW | NAME | SPOT_ID | CONTROL_TYPE | REFSEQ | GB_ACC | GENE | GENE_SYMBOL | ... | SEQUENCE | SPOT_ID.1 | ORDER | logFC | AveExpr | t | P.Value | padj | B | ENSEMBLE_GENE_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 30939 | 85 | 3 | A_24_P102821 | A_24_P102821 | False | NM_000952 | NM_000952 | 5724 | PTAFR | ... | ATACGGTCACTGAAGTGGTTGTGCCATTCAACCAGATCCCTGGCAA... | NaN | 30939 | 0.472665 | 0.049355 | 5.812795 | 2.199154e-07 | 0.005924 | 5.278602 | ENSG00000169403 |
1 rows × 29 columns
step 64: Unfortunately, the PTAFR gene is not included in the AOP network and thus can’t be mapped.
p4c.load_table_data_from_file('adaptedACR10h.tsv', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID')
{'mappedTables': [388893, 388931]}
5.2 ACR exposure time 2
step 65: You first import the expression table into a new dataframe.
ACR_24h= pd.read_csv('adaptedACR24h.tsv',sep='\t')
ACR_24h
ID | COL | ROW | NAME | SPOT_ID | CONTROL_TYPE | REFSEQ | GB_ACC | GENE | GENE_SYMBOL | ... | SEQUENCE | SPOT_ID.1 | ORDER | logFC | AveExpr | t | P.Value | padj | B | ENSEMBLE_GENE_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 42685 | 15 | 142 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 42685 | 1.179039 | 0.954675 | 6.285084 | 3.447616e-08 | 0.000253 | 6.360406 | ENSG00000100292 |
1 | 40823 | 26 | 126 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 40823 | 1.166378 | 0.956092 | 6.242149 | 4.085234e-08 | 0.000253 | 6.263043 | ENSG00000100292 |
2 | 9749 | 209 | 54 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 9749 | 1.173634 | 0.954391 | 6.235767 | 4.189503e-08 | 0.000253 | 6.248567 | ENSG00000100292 |
3 | 29347 | 94 | 127 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 29347 | 1.174612 | 0.964019 | 6.234992 | 4.202344e-08 | 0.000253 | 6.246809 | ENSG00000100292 |
4 | 43478 | 11 | 85 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 43478 | 1.181977 | 0.956175 | 6.228517 | 4.311172e-08 | 0.000253 | 6.232119 | ENSG00000100292 |
5 | 36362 | 53 | 37 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 36362 | 1.180659 | 0.960191 | 6.218832 | 4.479195e-08 | 0.000253 | 6.210143 | ENSG00000100292 |
6 | 4189 | 242 | 123 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 4189 | 1.170264 | 0.957791 | 6.176783 | 5.287237e-08 | 0.000253 | 6.114708 | ENSG00000100292 |
7 | 20842 | 144 | 137 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 20842 | 1.165382 | 0.963625 | 6.148413 | 5.912487e-08 | 0.000253 | 6.050293 | ENSG00000100292 |
8 | 6877 | 226 | 18 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 6877 | 1.166841 | 0.951196 | 6.142273 | 6.057178e-08 | 0.000253 | 6.036349 | ENSG00000100292 |
9 | 16256 | 171 | 129 | A_23_P120883 | A_23_P120883 | False | NM_002133 | NM_002133 | 3162 | HMOX1 | ... | TGGGGAGGGAGGTGTTTAACGGCACTGTGGCCTTGGTCTAACTTTT... | NaN | 16256 | 1.162878 | 0.964791 | 6.137396 | 6.174589e-08 | 0.000253 | 6.025273 | ENSG00000100292 |
10 | 26200 | 112 | 132 | A_32_P42684 | A_32_P42684 | False | NM_014331 | NM_014331 | 23657 | SLC7A11 | ... | TTACTGATACTAAATGTTGGCTACCTGTGATTTTATAGTATGCACA... | NaN | 26200 | 0.752090 | 0.640196 | 5.433205 | 9.494058e-07 | 0.003561 | 4.425625 | ENSG00000151012 |
11 | 21063 | 143 | 35 | A_23_P212655 | A_23_P212655 | False | NM_130446 | NM_130446 | 89857 | KLHL6 | ... | TTCTGGTCTCAATGGCTTCGGGAAACACACATATACACATACACCA... | NaN | 21063 | -0.748734 | -0.040346 | -4.660551 | 1.681000e-05 | 0.031843 | 2.698548 | ENSG00000172578 |
12 | 28950 | 96 | 72 | A_23_P313828 | A_23_P313828 | False | NM_181716 | NM_181716 | 201161 | CENPV | ... | TTTGACTGCAATTGCAGCATTTGCAAGAAGAAGCAGAATAGACACT... | NaN | 28950 | -0.234981 | 0.018625 | -4.523276 | 2.750745e-05 | 0.042802 | 2.398681 | ENSG00000166582 |
13 | 18739 | 156 | 94 | A_23_P25487 | A_23_P25487 | False | NM_018018 | NM_018018 | 55089 | SLC38A4 | ... | TGTTCTGGTCATCCTTGTGCCAACTATAAAATACATCTTCGGATTC... | NaN | 18739 | 0.293797 | -0.025377 | 4.464863 | 3.385527e-05 | 0.047625 | 2.271960 | ENSG00000139209 |
14 | 21386 | 141 | 69 | A_23_P163402 | A_23_P163402 | False | NM_000499 | NM_000499 | 1543 | CYP1A1 | ... | GGTAAAACAGGGCCACATAGATGCTGATGGAGCCTTCCCAAGTTGT... | NaN | 21386 | -0.234158 | 0.184573 | -4.450278 | 3.565014e-05 | 0.047916 | 2.240408 | ENSG00000140465 |
15 | 32586 | 75 | 109 | A_32_P165477 | A_32_P165477 | False | NM_014331 | NM_014331 | 23657 | SLC7A11 | ... | CATTTTGCTTTCCTAACCATTCAGTCAGGAATTAAAATATGGCATT... | NaN | 32586 | 0.758739 | 0.723792 | 4.431768 | 3.806166e-05 | 0.048953 | 2.200414 | ENSG00000151012 |
16 rows × 29 columns
step 66: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
ACR24h= pd.read_csv('adaptedACR24h.tsv',sep='\t')
ACR24h_version1= ACR24h[ACR24h['padj'] < 0.05]
ACR24h_version2=ACR24h_version1.drop('ID', axis=1)
ACR24h_version3= ACR24h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
ACR24h_version3.to_excel('ACR24h-adjusted.xlsx',index=False)
ACR24h_version3
GENE_SYMBOL | padj | t | B | logFC | SPOT_ID | |
---|---|---|---|---|---|---|
0 | HMOX1 | 0.000253 | 6.285084 | 6.360406 | 1.179039 | A_23_P120883 |
1 | HMOX1 | 0.000253 | 6.242149 | 6.263043 | 1.166378 | A_23_P120883 |
2 | HMOX1 | 0.000253 | 6.235767 | 6.248567 | 1.173634 | A_23_P120883 |
3 | HMOX1 | 0.000253 | 6.234992 | 6.246809 | 1.174612 | A_23_P120883 |
4 | HMOX1 | 0.000253 | 6.228517 | 6.232119 | 1.181977 | A_23_P120883 |
5 | HMOX1 | 0.000253 | 6.218832 | 6.210143 | 1.180659 | A_23_P120883 |
6 | HMOX1 | 0.000253 | 6.176783 | 6.114708 | 1.170264 | A_23_P120883 |
7 | HMOX1 | 0.000253 | 6.148413 | 6.050293 | 1.165382 | A_23_P120883 |
8 | HMOX1 | 0.000253 | 6.142273 | 6.036349 | 1.166841 | A_23_P120883 |
9 | HMOX1 | 0.000253 | 6.137396 | 6.025273 | 1.162878 | A_23_P120883 |
10 | SLC7A11 | 0.003561 | 5.433205 | 4.425625 | 0.752090 | A_32_P42684 |
11 | KLHL6 | 0.031843 | -4.660551 | 2.698548 | -0.748734 | A_23_P212655 |
12 | CENPV | 0.042802 | -4.523276 | 2.398681 | -0.234981 | A_23_P313828 |
13 | SLC38A4 | 0.047625 | 4.464863 | 2.271960 | 0.293797 | A_23_P25487 |
14 | CYP1A1 | 0.047916 | -4.450278 | 2.240408 | -0.234158 | A_23_P163402 |
15 | SLC7A11 | 0.048953 | 4.431768 | 2.200414 | 0.758739 | A_32_P165477 |
step 67: You now set the working directory to the clone of the network
p4c.set_current_network('clone-GSE69844')
{}
step 68: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('ACR24h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID',network='clone-GSE69844')
{'mappedTables': [235742, 235780]}
step 69: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')
step 70: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 71: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')
''
5.3 MA exposure time 1
step 72: You first import the expression table into a new dataframe.
MA_10h= pd.read_csv('adaptedMA10h.tsv',sep='\t')
MA_10h
ID | COL | ROW | NAME | SPOT_ID | CONTROL_TYPE | REFSEQ | GB_ACC | GENE | GENE_SYMBOL | ... | SEQUENCE | SPOT_ID.1 | ORDER | logFC | AveExpr | t | P.Value | padj | B | ENSEMBLE_GENE_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 20269 | 147 | 94 | A_23_P435029 | A_23_P435029 | False | NaN | BC015544 | NaN | H3C14 | ... | CATCACAGTTGACAGGTTAAAAGCATTCACTGCAGCGATCTATGAG... | NaN | 20269 | -0.424471 | 0.018465 | -5.210051 | 0.000002 | 0.024894 | 3.914649 | ENSG00000203811 |
1 | 5888 | 232 | 125 | A_23_P428298 | A_23_P428298 | False | NM_173561 | NM_173561 | 222643.0 | UNC5CL | ... | GGGGATATTTTCCCCATGGATCAAGATCCAGTTTAGGGTTGGGAAA... | NaN | 5888 | -0.602074 | 0.081130 | -4.989385 | 0.000005 | 0.045422 | 3.420390 | ENSG00000124602 |
2 | 9982 | 208 | 97 | A_24_P159434 | A_24_P159434 | False | NM_007261 | NM_007261 | 11314.0 | CD300A | ... | AGTTTCTCTGGACTCTTAGGTTTATTTTTAATATGAAATATAAAAA... | NaN | 9982 | 0.463349 | -0.037320 | 4.850629 | 0.000008 | 0.048363 | 3.111913 | ENSG00000167851 |
3 | 19016 | 155 | 49 | A_23_P87678 | A_23_P87678 | False | NM_004950 | NM_004950 | 1833.0 | EPYC | ... | GGATTGATCTGACATCAAATTTAATATCTGAGATTGATGAAGATGC... | NaN | 19016 | 0.281188 | 0.000073 | 4.846602 | 0.000009 | 0.048363 | 3.102991 | ENSG00000083782 |
4 rows × 29 columns
**step 73:**Unfortunately, these four genes are not included in the AOP network and thus can’t be mapped.
5.4 MA exposure time 2
step 74: You first import the expression table into a new dataframe.
MA_24h= pd.read_csv('adaptedMA24h.tsv',sep='\t')
MA_24h
ID | COL | ROW | NAME | SPOT_ID | CONTROL_TYPE | REFSEQ | GB_ACC | GENE | GENE_SYMBOL | ... | SEQUENCE | SPOT_ID.1 | ORDER | logFC | AveExpr | t | P.Value | padj | B | ENSEMBLE_GENE_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 21386 | 141 | 69 | A_23_P163402 | A_23_P163402 | False | NM_000499 | NM_000499 | 1543.0 | CYP1A1 | ... | GGTAAAACAGGGCCACATAGATGCTGATGGAGCCTTCCCAAGTTGT... | NaN | 21386 | -0.821543 | 0.184573 | -15.613779 | 2.421857e-23 | 5.450995e-19 | 41.618011 | ENSG00000140465 |
1 | 16486 | 170 | 9 | A_23_P257803 | A_23_P257803 | False | NM_013391 | NM_013391 | 29958.0 | DMGDH | ... | TGGTATTGACCGAACCAACCAGAAACCGGCTTCAGAAAAAAGGTGG... | NaN | 16486 | -0.876482 | 0.050920 | -14.794873 | 3.529727e-22 | 5.296355e-18 | 39.338343 | ENSG00000132837 |
2 | 20269 | 147 | 94 | A_23_P435029 | A_23_P435029 | False | NaN | BC015544 | NaN | H3C14 | ... | CATCACAGTTGACAGGTTAAAAGCATTCACTGCAGCGATCTATGAG... | NaN | 20269 | -1.086935 | 0.018465 | -13.341288 | 5.056895e-20 | 5.690903e-16 | 35.039718 | ENSG00000203811 |
3 | 16942 | 167 | 117 | A_23_P1676 | A_23_P1676 | False | NM_001080546 | NM_001080546 | 219854.0 | TMEM218 | ... | TACCCGTACCTTAGGATTTCCAACTGTTTTGAAAGGGAAATAGTAA... | NaN | 16942 | 0.455706 | -0.067257 | 12.524192 | 9.277620e-19 | 8.352641e-15 | 32.479715 | ENSG00000150433 |
4 | 25809 | 115 | 63 | A_23_P147950 | A_23_P147950 | False | NM_152419 | NM_152419 | 138050.0 | HGSNAT | ... | TCTTTGGAACTTCATTCCGAGGAGATAAGCTTTAACTTTCCAAAAG... | NaN | 25809 | 0.471999 | -0.082378 | 12.106840 | 4.236592e-18 | 3.178503e-14 | 31.132408 | ENSG00000165102 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2541 | 15620 | 175 | 41 | A_24_P331704 | A_24_P331704 | False | NM_182507 | NM_182507 | 144501.0 | KRT80 | ... | CCAAGGGAGCAAATCCTCAGTGGGGATACAAGACATATAAAGTATA... | NaN | 15620 | -0.301665 | 0.012832 | -2.916673 | 4.895606e-03 | 4.981367e-02 | -0.681984 | ENSG00000167767 |
2542 | 26372 | 111 | 128 | A_23_P351724 | A_23_P351724 | False | NM_022648 | NM_022648 | 7145.0 | TNS1 | ... | CTCTAAGCCAGAATGGAAAATTCACCAGGACTCCATTCTTAAGCCT... | NaN | 26372 | 0.312052 | -0.254361 | 2.916410 | 4.899232e-03 | 4.983931e-02 | -0.682639 | ENSG00000079308 |
2543 | 40411 | 29 | 99 | A_23_P369701 | A_23_P369701 | False | NM_021214 | NM_021214 | 58489.0 | ABHD17C | ... | ATTACTAGCCAACAGAGTTTTACTATTTTGATTGTCTGGTTGGTTT... | NaN | 40411 | 0.200178 | 0.007205 | 2.915475 | 4.912166e-03 | 4.993703e-02 | -0.684974 | ENSG00000136379 |
2544 | 33634 | 69 | 53 | A_23_P62115 | A_23_P62115 | False | NM_003254 | NM_003254 | 7076.0 | TIMP1 | ... | CATGGAGAGTGTCTGCGGATACTTCCACAGGTCCCACAACCGCAGC... | NaN | 33634 | -0.221966 | 0.073864 | -2.915367 | 4.913662e-03 | 4.994095e-02 | -0.685243 | ENSG00000102265 |
2545 | 29769 | 91 | 134 | A_23_P133279 | A_23_P133279 | False | NM_199133 | NM_199133 | 134145.0 | ATPSCKMT | ... | CTTGAGAGCTGCCACTCATTTAATATTTCTCATTTATGAGAAGAGA... | NaN | 29769 | 0.147226 | -0.003311 | 2.915143 | 4.916752e-03 | 4.996108e-02 | -0.685800 | ENSG00000150756 |
2546 rows × 29 columns
step 75: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
MA24h= pd.read_csv('adaptedMA24h.tsv',sep='\t')
MA24h_version1= MA24h[MA24h['padj'] < 0.05]
MA24h_version2=MA24h_version1.drop('ID', axis=1)
MA24h_version3= MA24h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
MA24h_version3.to_excel('MA24h-adjusted.xlsx',index=False)
MA24h_version3
GENE_SYMBOL | padj | t | B | logFC | SPOT_ID | |
---|---|---|---|---|---|---|
0 | CYP1A1 | 5.450995e-19 | -15.613779 | 41.618011 | -0.821543 | A_23_P163402 |
1 | DMGDH | 5.296355e-18 | -14.794873 | 39.338343 | -0.876482 | A_23_P257803 |
2 | H3C14 | 5.690903e-16 | -13.341288 | 35.039718 | -1.086935 | A_23_P435029 |
3 | TMEM218 | 8.352641e-15 | 12.524192 | 32.479715 | 0.455706 | A_23_P1676 |
4 | HGSNAT | 3.178503e-14 | 12.106840 | 31.132408 | 0.471999 | A_23_P147950 |
... | ... | ... | ... | ... | ... | ... |
2541 | KRT80 | 4.981367e-02 | -2.916673 | -0.681984 | -0.301665 | A_24_P331704 |
2542 | TNS1 | 4.983931e-02 | 2.916410 | -0.682639 | 0.312052 | A_23_P351724 |
2543 | ABHD17C | 4.993703e-02 | 2.915475 | -0.684974 | 0.200178 | A_23_P369701 |
2544 | TIMP1 | 4.994095e-02 | -2.915367 | -0.685243 | -0.221966 | A_23_P62115 |
2545 | ATPSCKMT | 4.996108e-02 | 2.915143 | -0.685800 | 0.147226 | A_23_P133279 |
2546 rows × 6 columns
step 76: You now set the working directory to the clone of the network
p4c.set_current_network('clone-GSE69844')
{}
step 77: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('MA24h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID',network='clone-GSE69844')
{'mappedTables': [235742, 235780]}
step 78: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')
step 79: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 80: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')
''
5.5 CP exposure time 1
step 81: You first import the expression table into a new dataframe.
CP_10h= pd.read_csv('adaptedCP10h.tsv',sep='\t')
CP_10h
ID | COL | ROW | NAME | SPOT_ID | CONTROL_TYPE | REFSEQ | GB_ACC | GENE | GENE_SYMBOL | ... | SEQUENCE | SPOT_ID.1 | ORDER | logFC | AveExpr | t | P.Value | padj | B | ENSEMBLE_GENE_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26605 | 110 | 2 | A_23_P216225 | A_23_P216225 | False | NM_004430 | NM_004430 | 1960 | EGR3 | ... | GGTTGTGAATTTCCAGGTACTTGGACTTTTTGTAGAAGTAGAGAGA... | NaN | 26605 | 0.863408 | 0.094340 | 7.955544 | 4.246899e-11 | 0.000002 | 10.081124 | ENSG00000179388 |
1 | 41618 | 22 | 65 | A_23_P212639 | A_23_P212639 | False | NM_004593 | NM_004593 | 6434 | TRA2A | ... | GCATTTGTGTAGTTTGGTGCTTTGTTCCAAGTTAAGTGTTTTCAGA... | NaN | 41618 | 0.232531 | 0.026262 | 6.309106 | 3.135061e-08 | 0.000470 | 6.437660 | ENSG00000164548 |
2 | 23385 | 129 | 151 | A_24_P416370 | A_24_P416370 | False | NM_024015 | NM_024015 | 3214 | HOXB4 | ... | CAGCAGAAGCCTCTCTCCTAGACTGAAAATGAATGTGAAACTAGGA... | NaN | 23385 | -0.241820 | -0.033594 | -5.680978 | 3.665825e-07 | 0.002750 | 5.006423 | ENSG00000182742 |
3 | 12053 | 196 | 35 | A_23_P79155 | A_23_P79155 | False | NM_001508 | NM_001508 | 2863 | GPR39 | ... | TGGAAGAACAATGCAGGAGGGGGTGGCATCTCCTTCAGCTTCAGCA... | NaN | 12053 | -0.201463 | -0.027988 | -5.199354 | 2.302939e-06 | 0.010886 | 3.913296 | ENSG00000183840 |
4 | 2479 | 252 | 143 | A_23_P106194 | A_23_P106194 | False | NM_005252 | NM_005252 | 2353 | FOS | ... | AGAGGGTTCCTGTAGACCTAGGGAGGACCTTATCTGTGCGTGAAAC... | NaN | 2479 | 0.652862 | 0.097279 | 5.152892 | 2.742032e-06 | 0.011221 | 3.808527 | ENSG00000170345 |
5 | 5075 | 237 | 51 | A_23_P143143 | A_23_P143143 | False | NM_002166 | NM_002166 | 3398 | ID2 | ... | AGGCTTCTGAATTCCCTTCTGAGTTAATGTCAAATGACAGCAAAGC... | NaN | 5075 | 0.631542 | 0.093306 | 5.023820 | 4.439835e-06 | 0.014181 | 3.518391 | ENSG00000115738 |
6 | 18707 | 156 | 158 | A_23_P39704 | A_23_P39704 | False | NM_001031684 | NM_001031684 | 6432 | SRSF7 | ... | CTCTCTTCGTAGATCAAGATCAGCTTCACTCAGAAGATCTAGGTCT... | NaN | 18707 | 0.196261 | 0.014785 | 4.813274 | 9.650441e-06 | 0.024134 | 3.048559 | ENSG00000115875 |
7 | 1390 | 258 | 112 | A_23_P131846 | A_23_P131846 | False | NM_005985 | NM_005985 | 6615 | SNAI1 | ... | AACAATGTCTGAAAAGGGACTGTGAGTAATGGCTGTCACTTGTCGG... | NaN | 1390 | 0.515691 | 0.161391 | 4.589456 | 2.171080e-05 | 0.044439 | 2.554946 | ENSG00000124216 |
8 | 16555 | 169 | 42 | A_24_P401615 | A_24_P401615 | False | NM_001039361 | NM_001039361 | 343071 | PRAMEF10 | ... | TTACCTGAGCCAGATGAGCAATCTTCGTGAACTCTTTTTAGCCTTC... | NaN | 16555 | 0.595546 | 0.072522 | 4.589357 | 2.171847e-05 | 0.044439 | 2.554731 | ENSG00000187545 |
9 rows × 29 columns
step 82: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
CP10h= pd.read_csv('adaptedCP10h.tsv',sep='\t')
CP10h_version1= CP10h[CP10h['padj'] < 0.05]
CP10h_version2=CP10h_version1.drop('ID', axis=1)
CP10h_version3= CP10h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
CP10h_version3.to_excel('CP10h-adjusted.xlsx',index=False)
CP10h_version3
GENE_SYMBOL | padj | t | B | logFC | SPOT_ID | |
---|---|---|---|---|---|---|
0 | EGR3 | 0.000002 | 7.955544 | 10.081124 | 0.863408 | A_23_P216225 |
1 | TRA2A | 0.000470 | 6.309106 | 6.437660 | 0.232531 | A_23_P212639 |
2 | HOXB4 | 0.002750 | -5.680978 | 5.006423 | -0.241820 | A_24_P416370 |
3 | GPR39 | 0.010886 | -5.199354 | 3.913296 | -0.201463 | A_23_P79155 |
4 | FOS | 0.011221 | 5.152892 | 3.808527 | 0.652862 | A_23_P106194 |
5 | ID2 | 0.014181 | 5.023820 | 3.518391 | 0.631542 | A_23_P143143 |
6 | SRSF7 | 0.024134 | 4.813274 | 3.048559 | 0.196261 | A_23_P39704 |
7 | SNAI1 | 0.044439 | 4.589456 | 2.554946 | 0.515691 | A_23_P131846 |
8 | PRAMEF10 | 0.044439 | 4.589357 | 2.554731 | 0.595546 | A_24_P401615 |
step 83: You now set the working directory to the clone of the network
p4c.set_current_network('clone-GSE69844')
{}
step 84: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneMame’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('CP10h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.geneID',network='clone-GSE69844')
{'mappedTables': [235742, 235780]}
step 85: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')
step 86: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 87: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')
''
5.6 CP exposure time 2
step 88: You first import the expression table into a new dataframe.
CP_24h= pd.read_csv('adaptedCP24h.tsv',sep='\t')
CP_24h
ID | COL | ROW | NAME | SPOT_ID | CONTROL_TYPE | REFSEQ | GB_ACC | GENE | GENE_SYMBOL | ... | SEQUENCE | SPOT_ID.1 | ORDER | logFC | AveExpr | t | P.Value | padj | B | ENSEMBLE_GENE_ID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2985 | 249 | 151 | A_24_P270728 | A_24_P270728 | False | NM_001042483 | NM_001042483 | 26471.0 | NUPR1 | ... | TATTCCCGCTGACTGAGTCTCTGAGGGGCTACCAGGAAAGCGCCTC... | NaN | 2985 | 0.569941 | 0.167603 | 10.048946 | 1.026382e-14 | 4.620257e-10 | 22.754357 | ENSG00000176046 |
1 | 26605 | 110 | 2 | A_23_P216225 | A_23_P216225 | False | NM_004430 | NM_004430 | 1960.0 | EGR3 | ... | GGTTGTGAATTTCCAGGTACTTGGACTTTTTGTAGAAGTAGAGAGA... | NaN | 26605 | 1.036190 | 0.094340 | 9.547580 | 7.333753e-14 | 1.650645e-09 | 21.111511 | ENSG00000179388 |
2 | 20886 | 144 | 49 | A_23_P1691 | A_23_P1691 | False | NM_002421 | NM_002421 | 4312.0 | MMP1 | ... | ACATGTGCAGTCACTGGTGTCACCCTGGATAGGCAAGGGATAACTC... | NaN | 20886 | -1.110417 | 0.336235 | -9.060676 | 5.051693e-13 | 7.580066e-09 | 19.484600 | ENSG00000196611 |
3 | 38540 | 40 | 101 | A_23_P1691 | A_23_P1691 | False | NM_002421 | NM_002421 | 4312.0 | MMP1 | ... | ACATGTGCAGTCACTGGTGTCACCCTGGATAGGCAAGGGATAACTC... | NaN | 38540 | -1.050224 | 0.329034 | -8.267679 | 1.208709e-11 | 8.352296e-08 | 16.778923 | ENSG00000196611 |
4 | 33376 | 70 | 60 | A_24_P85300 | A_24_P85300 | False | NM_020733 | NM_020733 | 57493.0 | HEG1 | ... | AGGATGAGCGTACCACTGAAGTCTGAAGATGTCGCCATTGAACGGA... | NaN | 33376 | 0.404108 | -0.001845 | 8.218781 | 1.471445e-11 | 8.352296e-08 | 16.610201 | ENSG00000173706 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
671 | 17270 | 165 | 141 | A_32_P800179 | A_32_P800179 | False | NaN | AK094933 | NaN | SLC30A6 | ... | CTGTGTGTTAAAAGCATTGTATACGTGAAAAAGGACTCAAACTCAT... | NaN | 17270 | -0.295208 | -0.018674 | -3.350876 | 1.364742e-03 | 4.938413e-02 | 0.583949 | ENSG00000152683 |
672 | 20415 | 146 | 142 | A_23_P162589 | A_23_P162589 | False | NM_001017535 | NM_001017535 | 7421.0 | VDR | ... | CAAGCGAGGTCAACAGAGAAGGCAGGAATGTGTGGCAGATTTAGTG... | NaN | 20415 | -0.180143 | 0.077814 | -3.350611 | 1.365847e-03 | 4.938443e-02 | 0.583251 | ENSG00000111424 |
673 | 16828 | 168 | 5 | A_24_P4816 | A_24_P4816 | False | NM_031412 | NM_031412 | 23710.0 | GABARAPL1 | ... | GGATTGGCTTTGATAGAGGAATGGGGATGATGTAAGTTTACAGTAT... | NaN | 16828 | 0.328695 | 0.144492 | 3.350035 | 1.368258e-03 | 4.943190e-02 | 0.581731 | ENSG00000139112 |
674 | 7523 | 222 | 86 | A_23_P18413 | A_23_P18413 | False | NM_016589 | NM_016589 | 51300.0 | TIMMDC1 | ... | TGCTGACAAATTTAAGTGCTGGTACCTGTGGTGGCAGTGGCTTGCT... | NaN | 7523 | -0.104950 | 0.041448 | -3.348029 | 1.376679e-03 | 4.965340e-02 | 0.576441 | ENSG00000113845 |
675 | 43883 | 8 | 126 | A_23_P366983 | A_23_P366983 | False | NM_013381 | NM_013381 | 29953.0 | TRHDE | ... | AGTTACCACATATTCACGTTTATAAAATCCTTAATTAAATGAGTAA... | NaN | 43883 | 0.170322 | -0.008790 | 3.347479 | 1.378996e-03 | 4.966040e-02 | 0.574992 | ENSG00000072657 |
676 rows × 29 columns
step 89: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the PCB1_DEG table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene ID’s that were added by CyTargetLinker extension in the previous notebook.
CP24h= pd.read_csv('adaptedCP10h.tsv',sep='\t')
CP24h_version1= CP24h[CP24h['padj'] < 0.05]
CP24h_version2=CP24h_version1.drop('ID', axis=1)
CP24h_version3= CP24h_version2[['GENE_SYMBOL', 'padj', 't', 'B','logFC','SPOT_ID']]
CP24h_version3.to_excel('CP24h-adjusted.xlsx',index=False)
CP24h_version3
GENE_SYMBOL | padj | t | B | logFC | SPOT_ID | |
---|---|---|---|---|---|---|
0 | EGR3 | 0.000002 | 7.955544 | 10.081124 | 0.863408 | A_23_P216225 |
1 | TRA2A | 0.000470 | 6.309106 | 6.437660 | 0.232531 | A_23_P212639 |
2 | HOXB4 | 0.002750 | -5.680978 | 5.006423 | -0.241820 | A_24_P416370 |
3 | GPR39 | 0.010886 | -5.199354 | 3.913296 | -0.201463 | A_23_P79155 |
4 | FOS | 0.011221 | 5.152892 | 3.808527 | 0.652862 | A_23_P106194 |
5 | ID2 | 0.014181 | 5.023820 | 3.518391 | 0.631542 | A_23_P143143 |
6 | SRSF7 | 0.024134 | 4.813274 | 3.048559 | 0.196261 | A_23_P39704 |
7 | SNAI1 | 0.044439 | 4.589456 | 2.554946 | 0.515691 | A_23_P131846 |
8 | PRAMEF10 | 0.044439 | 4.589357 | 2.554731 | 0.595546 | A_24_P401615 |
step 90: You now set the working directory to the clone of the network
p4c.set_current_network('clone-GSE69844')
{}
step 91: You will now integrate the expression table into the nodetable of the AOP network. This will be done using the function: p4c.load_table_data_from_file where you select ‘True’ for the second variable as the first row of the table has the needed column names. You also select ’node’ for table and ‘CTL.GeneID’ for table_key_column as you want the expression data to be matched to the Gene names that were added by CyTargetLinker extension in the previous notebook.
p4c.load_table_data_from_file('CP24h-adjusted.xlsx', first_row_as_column_names=True,table='node', table_key_column='CTL.GeneID',network='clone-GSE69844')
{'mappedTables': [235742, 235780]}
step 92: You define the style for the mapping so that the expression values (log2FoldChange) are mapped to the gene nodes. This is done by first retrieving the log2FoldChange column.
Log2Foldchange_column = p4c.get_table_columns(table='node', columns='logFC',network='clone-GSE69844')
step 93: This is followed by definition of the color scheme so that low and high expression values receive distinct color.For the color scheme, you will use the following colors:
- Low expression value (minimum) = blue node color
- No expression value = white node color
- High expression value (maximum) = red node color
This color scheme was also described in the official py4cytoscape documentation (1).
Blue_expression_color= Log2Foldchange_column.min().values[0]
Red_expression_color= Log2Foldchange_column.max().values[0]
White_expression_color= Blue_expression_color + (Red_expression_color -Blue_expression_color)/2
step 94: You apply this color scheme to the network.
p4c.set_node_color_mapping('logFC', [Blue_expression_color,White_expression_color,Red_expression_color], ['#0000FF', '#FFFFFF', '#FF0000'],mapping_type='c', style_name='default',network='clone-GSE69844')
''
Section 6: Metadata
step 95. At last, the metadata belonging to this jupyternotebook is displayed which contains the version numbers of packages and system-set-up for interested users. This requires the usage of packages:Watermark and print_versions.
%load_ext watermark
!pip install print-versions
Requirement already satisfied: print-versions in c:\users\shaki\anaconda3\lib\site-packages (0.1.0)
%watermark
Last updated: 2025-06-03T17:26:16.367030+02:00
Python implementation: CPython
Python version : 3.12.3
IPython version : 8.25.0
Compiler : MSC v.1938 64 bit (AMD64)
OS : Windows
Release : 11
Machine : AMD64
Processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel
CPU cores : 8
Architecture: 64bit
from print_versions import print_versions
print_versions(globals())
json==2.0.9
ipykernel==6.28.0
numpy==1.26.4
pandas==2.2.2
ipywidgets==8.0.3
xarray==2023.6.0
py4cytoscape==1.9.0
Reference:
- Basic Data Visualization — py4cytoscape 0.0.5 documentation \[Internet\]. Readthedocs.io. 2021 \[cited 2025 Feb 26\]. Available from: https://py4cytoscape.readthedocs.io/en/0.0.5/tutorials/basic-data-visualization.html