Functions in FLUTE
This page describes the functions supported in FLUTE tool, including general functions that run without accessing database, and query functions that access the flute.sql database.
General Functions
- run_FLUTE.filter_protein_ints(all_ints_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]
This function returns interactions that involves protein either in regulator or regulated element
- Parameters
all_ints_df (pd.DataFrame) – All interactions from the input file, in DataFrame format, has to include the following columns: ‘Regulated Name’, ‘Regulated ID’, ‘Regulated Type’, ‘Regulator Name’, ‘Regulator ID’, ‘Regulator Type’
- Returns
pt_only_ints – Interactions that involves protein either in regulator or regulated element. Formatted with only name (lowercase) and ID columns.
- Return type
pd.DataFrame
- run_FLUTE.get_chem_id(pt_only_ints: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]
This function adds CIDm information to protein-chemical interactions
- Parameters
pt_only_ints (pd.DataFrame) – Interaction dataframe to be populated with CIDm information
- Returns
pt_only_ints – Interaction dataframe with CIDm information filled out
- Return type
pd.DataFrame
- run_FLUTE.get_go_id(pt_only_ints: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]
This function adds go information to protein-biological process interactions
- Parameters
pt_only_ints (pd.DataFrame) – Interaction dataframe to be populated with GO information
- Returns
pt_only_ints – Interaction dataframe with GO information filled out
- Return type
pd.DataFrame
- run_FLUTE.get_string_id(pt_only_ints: pandas.core.frame.DataFrame, id_stringid_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]
This function adds string_id information to regulated element and regulator in interactions
- Parameters
pt_only_ints (pd.DataFrame) – Interaction dataframe to be populated with string_id information
id_stringid_df (pd.DataFrame) – Species dataframe that links id and string_id information
- Returns
pt_only_ints – Interaction dataframe with string_id information filled out
- Return type
pd.DataFrame
- run_FLUTE.get_uid(pt_only_ints: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]
This function adds uid information to regulated element and regulator in interactions: it is the same as ID if its corresponding stringID exists
- Parameters
pt_only_ints (pd.DataFrame) – Interaction dataframe to be populated with uid information
- Returns
pt_only_ints – Interaction dataframe with uid information filled out
- Return type
pd.DataFrame
- run_FLUTE.extract_year(input_path)[source]
Load the input OA_file into a pd.DataFrame with Year, PMC, PMID columns
- Parameters
input_path (str) – Path to input OA_file
- Returns
year_df – A dataframe with Year, PMCID, PMID columns
- Return type
pd.DataFrame
- run_FLUTE.filter_recent_ints(f_in, year_df, x=5)[source]
This function should search input interaction file and find all interactions occurring in papers less than X years old
- Parameters
f_in (str) – Input filename that contains list of interactions, it shall include a column “Paper IDs”
year_df (pd.DataFrame) – A dataframe with Year, PMCID, PMID columns
x (int) – Integer specifying # of years, default is 5
- Returns
df – A dataframe containing all filtered interactions occurring in papers less than X years old
- Return type
pd.DataFrame
- run_FLUTE.get_duplicates_ints(f_in)[source]
This function calculates the number of duplicated occurrences of an interaction in a reading set. Interactions with same regulated ID, Regulator ID, and Paper ID are considered as duplicates.
- Parameters
f_in (str) – Input filename that contains list of interactions, it shall include columns ‘Regulated ID’, ‘Regulator ID’, ‘Paper IDs’
- Returns
duplicate_counts – a dataframe to indicate the number of occurrences of duplicated interaction in a reading set
- Return type
pd.DataFrame
Query Functions
- class run_FLUTE.Query(user, password, host, database)[source]
- __init__(user, password, host, database)[source]
Initialize query with credentials and configuration settings.
- Parameters
user (str) – Name of the MySQL user where the FLUTE DB is stored, usually ‘root’
password (str) – Password for the MySQL user where the FLUTE DB is stored.
host (str) – Host name for the local machine where the coopy of the FLUTE DB is stored, usually ‘localhost’
database (str) – Name of the local copy of the FLUTE DB, usually ‘flute’
- filter_pt_ints_by_scoring(pt_only_ints: pandas.core.frame.DataFrame, score_tuple: tuple) pandas.core.frame.DataFrame[source]
This function further filters protein-only interactions via multiple tables in database, subject to score tuple. It would return a DataFrame with scored protein-only interactions
- Parameters
pt_only_ints (pd.DataFrame) – protein-only interactions with name/id/uid/stringID/GoID/CIDm etc. information filled out
score_tuple (tuple) – a tuple of three numbers, denoting thresholds of escore, tscore, dscore
- Returns
pt_scored_ints – interactions further filtered by database entries and score thresholds
- Return type
pd.DataFrame
- filtered_input_ints(f_in, score_tuple, output_path)[source]
This function uses the input interaction file and tuple of score threshold to generate all scored protein-related interactions and filtered scored interactions from the input file
- Parameters
f_in (str) – The path of input interaction file (.xlsx), best in BioRECIPE format. Minimum required column names include [‘Regulated Name’, ‘Regulated ID’, ‘Regulated Type’, ‘Regulator Name’, ‘Regulator ID’, ‘Regulator Type’, ‘Paper IDs’]
score_tuple (tuple) – a tuple of three numbers, denoting thresholds of escore, tscore, dscore
output_path (str) – specify output path to store the final output: 1) <output_path>_grd_ints_scores.xlsx : contains all scored protein-related interactions 2) <output_path>_filtered.xlsx : contains filtered interactions from the input file
This function retrieves related papers based on a protein ID and saves a file of related paper IDs.
- Parameters
year_df (pd.DataFrame) – A DataFrame with Year, PMCID, PMID information
prot (str) – Input protein ID to retrieve related papers. ‘,’ can be used to split multiple IDs
- Returns
fp_list – the list of paper PMCIDs that are related to inquired protein IDs
- Return type
list
- get_same_papers_ints(file_in, year_df)[source]
This function retrieves interactions from the same papers as input spreadsheet
- Parameters
file_in (str) – File name of a spreadsheet (.xlsx) containing a column name “Paper IDs”
year_df (pd.DataFrame) – A DataFrame with Year, PMCID, PMID information
- Returns
ints_same_pp – Array of interactions that are occurred in the same papers as input file, each interaction is in the format of (protein1’s external ID, protein2’s external ID, mode, source in PMID)
- Return type
np.array
- ground_string_id(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]
This function uses the flute database to ground species name and ID to identify stringID.
- Parameters
df (pd.DataFrame) – containing the columns of ‘Name’ and ‘ID’ of all studied species
- Returns
df – New column ‘stringID’ added for all studied species
- Return type
pd.DataFrame
Dependencies
pandas library
numpy library
matplotlib library
networkx library
MySQL Connector for Python3 library