API
Downloading dataset
- class dtp.Downloader(data_storage_config, gen3_config=None, data_storage_type='pacs')
- download_dataset(dataset_id, dest)
Downloading dataset (including data from PACS/iRods & metadata from Gen3) in SDS format
- Parameters:
dataset_id (str) – Dataset id/name on Gen3
dest (string) – Path to the save folder
Interacting with metadata
- class dtp.Auth(endpoint=None, refresh_file=None, refresh_token=None, idp=None, client_credentials=None, client_scopes=None)
Class for Gen3 Authentication inherited from the offical Gen3 API’s Gen3 Auth Helper https://gen3sdk-python.readthedocs.io/en/latest/auth.html Gen3 auth helper class for use with requests auth.
Implements requests.auth.AuthBase in order to support JWT authentication. Generates access tokens from the provided refresh token file or string. Automatically refreshes access tokens when they expire.
- Args:
- refresh_file (str, opt): The file containing the downloaded JSON web token. Optional if working in a Gen3 Workspace.
Defaults to (env[“GEN3_API_KEY”] || “credentials”) if refresh_token and idp not set. Includes ~/.gen3/ in search path if value does not include /. Interprets “idp://wts/<idp>” as an idp. Interprets “accesstoken:///<token>” as an access token
refresh_token (str, opt): The JSON web token. Optional if working in a Gen3 Workspace. idp (str, opt): If working in a Gen3 Workspace, the IDP to use can be specified -
“local” indicates the local environment fence idp
- client_credentials (tuple, opt): The (client_id, client_secret) credentials for an OIDC client
that has the ‘client_credentials’ grant, allowing it to obtain access tokens.
- client_scopes (str, opt): Space-separated list of scopes requested for access tokens obtained from client
credentials. Default: “user data openid”
- Examples:
This generates the Gen3Auth class pointed at the sandbox commons while using the credentials.json downloaded from the commons profile page and installed in ~/.gen3/credentials.json
>>> auth = Gen3Auth()
or use ~/.gen3/crdc.json:
>>> auth = Gen3Auth(refresh_file="crdc")
or use some arbitrary file:
>>> auth = Gen3Auth(refresh_file="./key.json")
or set the GEN3_API_KEY environment variable rather than pass the refresh_file argument to the Gen3Auth constructor.
If working with an OIDC client that has the ‘client_credentials’ grant, allowing it to obtain access tokens, provide the client ID and secret:
Note: client secrets should never be hardcoded!
>>> auth = Gen3Auth( endpoint="https://datacommons.example", client_credentials=("client ID", os.environ["GEN3_OIDC_CLIENT_CREDS_SECRET"]) )
If working in a Gen3 Workspace, initialize as follows:
>>> auth = Gen3Auth()
- curl(path, request=None, data=None)
Curl the given endpoint - ex: gen3 curl /user/user. Return requests.Response
- Args:
path (str): path under the commons to curl (/user/user, /index/index, /authz/mapping, …) request (str in GET|POST|PUT|DELETE): default to GET if data is not set, else default to POST data (str): json string or “@filename” of a json file
- get_access_token()
Get the access token - auto refresh if within 5 minutes of expiration
- get_access_token_from_wts(endpoint=None)
Try to fetch an access token for the given idp from the wts in the given namespace. If idp is not set, then default to “local”
- refresh_access_token(endpoint=None)
Get a new access token
- class dtp.Gen3Convertor(project, experiment, version='2.0.0')
Converting the metadata from SPARC dataset structure (SDS) to Gen3 submittable structure in json format
- execute(source_dir, dest_dir)
Converting metadata
- Parameters:
source_dir (str or pathlib.Path object) – Path to the source (SDS) directory
dest_dir (str or pathlib.Path object) – Path to the destination (Gen3) directory
- Returns:
- Return type:
- static read_excel(path, sheet_name=None)
Reading Excel data as a python dataframe object
- Parameters:
path (str or pathlib.Path object) – Path to the Excel file
sheet_name (str) – Excel sheet name
- Returns:
Data in dataframe object format
- Return type:
object
- set_schema_dir(path)
Setting the SDS schema directory
- Parameters:
path (str) – Path to the SDS schema directory
- Returns:
- Return type:
- class dtp.Exporter(auth)
Class for exporting Gen3 metadata
- export_node(program, project, node_type, fileformat, filename=None)
Exporting all records in a single Gen3 node
- Parameters:
program (str) – Program name
project (str) – Project
node_type (str) – Node name
fileformat (str) – Exported file format (json or tsv)
filename (str) – Exported filename
- Returns:
List of records (metadata) in dictionary format
- Return type:
list
- export_record(program, project, uuid, fileformat, filename=None)
Exporting the metadata in a single record
- Parameters:
program (str) – Program name
project (str) – Project
uuid (str) – Record UUID
fileformat (str) – Exported file format (json or tsv)
filename (str) – Exported filename
- Returns:
Metadata in a single record
- Return type:
dict
- save(data, fileformat, path)
Saving the metadata (dict) in json format
- Parameters:
data (dict) – metadata
fileformat (str) – file format (currently only json)
path (str) – Path the save file
- Returns:
- Return type:
- class dtp.Querier(auth)
Class for querying Gen3. Also accepts queries in GraphQL syntax.
- get_node_records(node, program, project)
Getting all the records in a Gen3 node
- Parameters:
node (str) – Name of the target node
program (str) – program name
project (str) – project name
- Returns:
A list of records in dictionary format
- Return type:
list
- get_programs()
Getting all programs that the user have access to
- Returns:
List of programs
- Return type:
list
- get_projects(program)
Getting the projects by program name
- Parameters:
program (str) – Name of a Gen3 program
- Returns:
List of projects
- Return type:
list
- graphql_query(query_string, variables=None)
Sending a GraphQL query to Gen3
- Parameters:
query_string (string) – query in GraphQL syntax
variables (dict) – query variables (optional)
- Returns:
query response
- Return type:
dict