Functions in FLUTE

This page describes the functions supported in FLUTE tool, including general functions that run without accessing database, and query functions that access the flute.sql database.

General Functions

run_FLUTE.filter_protein_ints(all_ints_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

This function returns interactions that involves protein either in regulator or regulated element

Parameters

all_ints_df (pd.DataFrame) – All interactions from the input file, in DataFrame format, has to include the following columns: ‘Regulated Name’, ‘Regulated ID’, ‘Regulated Type’, ‘Regulator Name’, ‘Regulator ID’, ‘Regulator Type’

Returns

pt_only_ints – Interactions that involves protein either in regulator or regulated element. Formatted with only name (lowercase) and ID columns.

Return type

pd.DataFrame

run_FLUTE.get_chem_id(pt_only_ints: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

This function adds CIDm information to protein-chemical interactions

Parameters

pt_only_ints (pd.DataFrame) – Interaction dataframe to be populated with CIDm information

Returns

pt_only_ints – Interaction dataframe with CIDm information filled out

Return type

pd.DataFrame

run_FLUTE.get_go_id(pt_only_ints: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

This function adds go information to protein-biological process interactions

Parameters

pt_only_ints (pd.DataFrame) – Interaction dataframe to be populated with GO information

Returns

pt_only_ints – Interaction dataframe with GO information filled out

Return type

pd.DataFrame

run_FLUTE.get_string_id(pt_only_ints: pandas.core.frame.DataFrame, id_stringid_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

This function adds string_id information to regulated element and regulator in interactions

Parameters
  • pt_only_ints (pd.DataFrame) – Interaction dataframe to be populated with string_id information

  • id_stringid_df (pd.DataFrame) – Species dataframe that links id and string_id information

Returns

pt_only_ints – Interaction dataframe with string_id information filled out

Return type

pd.DataFrame

run_FLUTE.get_uid(pt_only_ints: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

This function adds uid information to regulated element and regulator in interactions: it is the same as ID if its corresponding stringID exists

Parameters

pt_only_ints (pd.DataFrame) – Interaction dataframe to be populated with uid information

Returns

pt_only_ints – Interaction dataframe with uid information filled out

Return type

pd.DataFrame

run_FLUTE.extract_year(input_path)[source]

Load the input OA_file into a pd.DataFrame with Year, PMC, PMID columns

Parameters

input_path (str) – Path to input OA_file

Returns

year_df – A dataframe with Year, PMCID, PMID columns

Return type

pd.DataFrame

run_FLUTE.filter_recent_ints(f_in, year_df, x=5)[source]

This function should search input interaction file and find all interactions occurring in papers less than X years old

Parameters
  • f_in (str) – Input filename that contains list of interactions, it shall include a column “Paper IDs”

  • year_df (pd.DataFrame) – A dataframe with Year, PMCID, PMID columns

  • x (int) – Integer specifying # of years, default is 5

Returns

df – A dataframe containing all filtered interactions occurring in papers less than X years old

Return type

pd.DataFrame

run_FLUTE.get_duplicates_ints(f_in)[source]

This function calculates the number of duplicated occurrences of an interaction in a reading set. Interactions with same regulated ID, Regulator ID, and Paper ID are considered as duplicates.

Parameters

f_in (str) – Input filename that contains list of interactions, it shall include columns ‘Regulated ID’, ‘Regulator ID’, ‘Paper IDs’

Returns

duplicate_counts – a dataframe to indicate the number of occurrences of duplicated interaction in a reading set

Return type

pd.DataFrame

Query Functions

class run_FLUTE.Query(user, password, host, database)[source]
__init__(user, password, host, database)[source]

Initialize query with credentials and configuration settings.

Parameters
  • user (str) – Name of the MySQL user where the FLUTE DB is stored, usually ‘root’

  • password (str) – Password for the MySQL user where the FLUTE DB is stored.

  • host (str) – Host name for the local machine where the coopy of the FLUTE DB is stored, usually ‘localhost’

  • database (str) – Name of the local copy of the FLUTE DB, usually ‘flute’

filter_pt_ints_by_scoring(pt_only_ints: pandas.core.frame.DataFrame, score_tuple: tuple) pandas.core.frame.DataFrame[source]

This function further filters protein-only interactions via multiple tables in database, subject to score tuple. It would return a DataFrame with scored protein-only interactions

Parameters
  • pt_only_ints (pd.DataFrame) – protein-only interactions with name/id/uid/stringID/GoID/CIDm etc. information filled out

  • score_tuple (tuple) – a tuple of three numbers, denoting thresholds of escore, tscore, dscore

Returns

pt_scored_ints – interactions further filtered by database entries and score thresholds

Return type

pd.DataFrame

filtered_input_ints(f_in, score_tuple, output_path)[source]

This function uses the input interaction file and tuple of score threshold to generate all scored protein-related interactions and filtered scored interactions from the input file

Parameters
  • f_in (str) – The path of input interaction file (.xlsx), best in BioRECIPE format. Minimum required column names include [‘Regulated Name’, ‘Regulated ID’, ‘Regulated Type’, ‘Regulator Name’, ‘Regulator ID’, ‘Regulator Type’, ‘Paper IDs’]

  • score_tuple (tuple) – a tuple of three numbers, denoting thresholds of escore, tscore, dscore

  • output_path (str) – specify output path to store the final output: 1) <output_path>_grd_ints_scores.xlsx : contains all scored protein-related interactions 2) <output_path>_filtered.xlsx : contains filtered interactions from the input file

This function retrieves related papers based on a protein ID and saves a file of related paper IDs.

Parameters
  • year_df (pd.DataFrame) – A DataFrame with Year, PMCID, PMID information

  • prot (str) – Input protein ID to retrieve related papers. ‘,’ can be used to split multiple IDs

Returns

fp_list – the list of paper PMCIDs that are related to inquired protein IDs

Return type

list

get_same_papers_ints(file_in, year_df)[source]

This function retrieves interactions from the same papers as input spreadsheet

Parameters
  • file_in (str) – File name of a spreadsheet (.xlsx) containing a column name “Paper IDs”

  • year_df (pd.DataFrame) – A DataFrame with Year, PMCID, PMID information

Returns

ints_same_pp – Array of interactions that are occurred in the same papers as input file, each interaction is in the format of (protein1’s external ID, protein2’s external ID, mode, source in PMID)

Return type

np.array

ground_string_id(df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

This function uses the flute database to ground species name and ID to identify stringID.

Parameters

df (pd.DataFrame) – containing the columns of ‘Name’ and ‘ID’ of all studied species

Returns

df – New column ‘stringID’ added for all studied species

Return type

pd.DataFrame

Dependencies