API

Downloading dataset

class dtp.Downloader(data_storage_config, gen3_config=None, data_storage_type='pacs')
download_dataset(dataset_id, dest)

Downloading dataset (including data from PACS/iRods & metadata from Gen3) in SDS format

Parameters:
  • dataset_id (str) – Dataset id/name on Gen3

  • dest (string) – Path to the save folder

Interacting with metadata

class dtp.Auth(endpoint=None, refresh_file=None, refresh_token=None, idp=None, client_credentials=None, client_scopes=None)

Class for Gen3 Authentication inherited from the offical Gen3 API’s Gen3 Auth Helper https://gen3sdk-python.readthedocs.io/en/latest/auth.html Gen3 auth helper class for use with requests auth.

Implements requests.auth.AuthBase in order to support JWT authentication. Generates access tokens from the provided refresh token file or string. Automatically refreshes access tokens when they expire.

Args:
refresh_file (str, opt): The file containing the downloaded JSON web token. Optional if working in a Gen3 Workspace.

Defaults to (env[“GEN3_API_KEY”] || “credentials”) if refresh_token and idp not set. Includes ~/.gen3/ in search path if value does not include /. Interprets “idp://wts/<idp>” as an idp. Interprets “accesstoken:///<token>” as an access token

refresh_token (str, opt): The JSON web token. Optional if working in a Gen3 Workspace. idp (str, opt): If working in a Gen3 Workspace, the IDP to use can be specified -

“local” indicates the local environment fence idp

client_credentials (tuple, opt): The (client_id, client_secret) credentials for an OIDC client

that has the ‘client_credentials’ grant, allowing it to obtain access tokens.

client_scopes (str, opt): Space-separated list of scopes requested for access tokens obtained from client

credentials. Default: “user data openid”

Examples:

This generates the Gen3Auth class pointed at the sandbox commons while using the credentials.json downloaded from the commons profile page and installed in ~/.gen3/credentials.json

>>> auth = Gen3Auth()

or use ~/.gen3/crdc.json:

>>> auth = Gen3Auth(refresh_file="crdc")

or use some arbitrary file:

>>> auth = Gen3Auth(refresh_file="./key.json")

or set the GEN3_API_KEY environment variable rather than pass the refresh_file argument to the Gen3Auth constructor.

If working with an OIDC client that has the ‘client_credentials’ grant, allowing it to obtain access tokens, provide the client ID and secret:

Note: client secrets should never be hardcoded!

>>> auth = Gen3Auth(
    endpoint="https://datacommons.example",
    client_credentials=("client ID", os.environ["GEN3_OIDC_CLIENT_CREDS_SECRET"])
)

If working in a Gen3 Workspace, initialize as follows:

>>> auth = Gen3Auth()
curl(path, request=None, data=None)

Curl the given endpoint - ex: gen3 curl /user/user. Return requests.Response

Args:

path (str): path under the commons to curl (/user/user, /index/index, /authz/mapping, …) request (str in GET|POST|PUT|DELETE): default to GET if data is not set, else default to POST data (str): json string or “@filename” of a json file

get_access_token()

Get the access token - auto refresh if within 5 minutes of expiration

get_access_token_from_wts(endpoint=None)

Try to fetch an access token for the given idp from the wts in the given namespace. If idp is not set, then default to “local”

refresh_access_token(endpoint=None)

Get a new access token

class dtp.Gen3Convertor(project, experiment, version='2.0.0')

Converting the metadata from SPARC dataset structure (SDS) to Gen3 submittable structure in json format

execute(source_dir, dest_dir)

Converting metadata

Parameters:
  • source_dir (str or pathlib.Path object) – Path to the source (SDS) directory

  • dest_dir (str or pathlib.Path object) – Path to the destination (Gen3) directory

Returns:

Return type:

static read_excel(path, sheet_name=None)

Reading Excel data as a python dataframe object

Parameters:
  • path (str or pathlib.Path object) – Path to the Excel file

  • sheet_name (str) – Excel sheet name

Returns:

Data in dataframe object format

Return type:

object

set_schema_dir(path)

Setting the SDS schema directory

Parameters:

path (str) – Path to the SDS schema directory

Returns:

Return type:

class dtp.Exporter(auth)

Class for exporting Gen3 metadata

export_node(program, project, node_type, fileformat, filename=None)

Exporting all records in a single Gen3 node

Parameters:
  • program (str) – Program name

  • project (str) – Project

  • node_type (str) – Node name

  • fileformat (str) – Exported file format (json or tsv)

  • filename (str) – Exported filename

Returns:

List of records (metadata) in dictionary format

Return type:

list

export_record(program, project, uuid, fileformat, filename=None)

Exporting the metadata in a single record

Parameters:
  • program (str) – Program name

  • project (str) – Project

  • uuid (str) – Record UUID

  • fileformat (str) – Exported file format (json or tsv)

  • filename (str) – Exported filename

Returns:

Metadata in a single record

Return type:

dict

save(data, fileformat, path)

Saving the metadata (dict) in json format

Parameters:
  • data (dict) – metadata

  • fileformat (str) – file format (currently only json)

  • path (str) – Path the save file

Returns:

Return type:

class dtp.Querier(auth)

Class for querying Gen3. Also accepts queries in GraphQL syntax.

get_node_records(node, program, project)

Getting all the records in a Gen3 node

Parameters:
  • node (str) – Name of the target node

  • program (str) – program name

  • project (str) – project name

Returns:

A list of records in dictionary format

Return type:

list

get_programs()

Getting all programs that the user have access to

Returns:

List of programs

Return type:

list

get_projects(program)

Getting the projects by program name

Parameters:

program (str) – Name of a Gen3 program

Returns:

List of projects

Return type:

list

graphql_query(query_string, variables=None)

Sending a GraphQL query to Gen3

Parameters:
  • query_string (string) – query in GraphQL syntax

  • variables (dict) – query variables (optional)

Returns:

query response

Return type:

dict

class dtp.Gen3Submitter(endpoint, credentials)

Class for Gen3 submission

submit_record(program, project, file)

Submitting metadata to Gen3

Parameters:
  • program (str) – Program name

  • project (str) – Project name

  • file (str) – Path to the metadata file

Returns:

Return type: