![]() | GRIA Workflow Plugins |
Introduction
Installation
Tutorial
|
This GRIA Workflow Plugins comprise of a number of Processors for use in Taverna. Discovery is performed to determine the GRIA applications and workflows that are hosted by a particular GRIA server. After performing discovery, appropriate Processors are made available for use in workflows. After adding a processor to a workflow, some configuration may be appropriate, though often the default processor configuration will be suitable. The sections below describe GRIA application and workflow discovery, the Processors provided by this software and their configuration; and details of log files that should be sent in support queries and bug reports, if you encounter problems with the software.
Discovery
The GRIA applications and workflows hosted by a GRIA server are discovered as follows. In the Available services panel tree, right-click on the root node, named Available Processors. In the context menu that's displayed, select Add applications from a GRIA service provider or Add workflows from a Freefluo service. This can be seen in Figure 12 and Figure 13, respectively. You'll be asked to enter the Address (URL) of the GRIA Services or Freefluo service. Enter this appropriately. Usually this will involve simply replacing the HOST placeholder in the text field with the name of the server. You may also need to change the URL prefix from https to http if the services aren't secured by SSL. After discovery is complete, Processors will be made available in the Available services panel tree. After discovering GRIA applications, there will be processors available for performing data transfer operations and a processor for each application hosted by the server. Similarly, after discovering Freefluo workflows, there will be a processor available for each hosted workflow.
Figure 12 Application discovery
Figure 13 Workflow discovery
Processor Configuration
After adding a processor to a workflow, Processor configuration is achieved by right-clicking on the processor in the Advanced model explorer panel and selecting the appropriate configuration option in the context menu. For example, to configure a job processor, select Configure GRIA job, as below.
Figure 14 Configuring processors In addition to configuring processors using the Advanced model explorer and the associated dialog windows, some global configuration settings can be provided in configuration files. Details of this are provided under the appropriate subsections, below.
Upload processor
The upload processor is used to transfer a file from the client machine to a GRIA server. The uploaded file is said to be staged at the GRIA server. Configuration options for the upload processor include specifying a data stager as well as resource allocation settings. An upload processor always has an input port named localFile and optionally can have a second and third input port of any name. The localFile port should be passed a relative or absolute file name corresponding to the file to upload. One optional input port can be used to specify at workflow runtime a data stager to upload the file to. A second optional input port of any name can be exposed to obtain at runtime the ID of a resource allocation that should be used if creating a new data stager. Details of optional port configuration are provided below. The upload processor always has a single output port named dataStager that will make available the ID for the data stager at which the file was stored.
Figure 15 Upload processor configuration
Data Stager
There are three options concerning data stager use. The first, Use existing should be used when an appropriate data stager exists before the processor has begun execution in the workflow. Selection of the associated Settings button displays a dialog for specifying information relating to how a data stager should be selected.
Figure 16 Data stager configuration Three methods for data stager selection are supported. Firstly, a data stager ID can be provided. The second option is to specify a human readable alias that identifies the data stager in the GRIA state file. Finally, the data stager ID can be obtained from an input port on the upload processor at processor runtime. Selecting the input port option will result in a new input port being exposed on the upload processor, from which the data stager ID will be read. The second option for data stager use, is Create new. Use this option if you want the uploaded file to be stored in a new data stager, created at the GRIA server. When this option is selected, the options for resource allocation use become relevant. This is because in GRIA, whenever a data stager (or job) is allocated, it is done so within the context of a resource allocation. The final option for data stager use is Use existing if possible. Otherwise, create new. This option is useful if you intend to use an exsisting data stager but want to be tolerant of errors that may occur when uploading the data. When this option is used, the processor will first attempt to upload the data to the stager specified in the Settings for existing data stager dialog. If this fails and the error is recoverable, as for example if the resource allocation has no more data storage or data transfer resource remaining, the processor will create a new data stager.
Authorisations
The authorisations section of the dialog is used to configure access control for data stagers. This is important when the workflow will be deployed to a Freefluo service. To enable a remote client of the Freefluo service to be able to read the output data stagers of the Upload processor, select the checkbox labelled Open authorisation for client to read outputs. Typically, a workflow author should be careful to allow Freefluo service clients to access only the data stagers they require.
Resource Allocation
As mentioned above, whenever a new data stager is used, the settings for how to select or create a resource allocation become applicable. Please see the Resource allocation section below for details.
Changing default settings
The default values for the upload processor settings described above can be changed by editing the configuration file: TAVERNA_HOME/conf/gria-upload-defaults.xml
Job processor
Job processors are used to execute a specific application at a GRIA server. Job processors can be linked together in a workflow by passing references to data stagers (data stager IDs) between them. The number and names of input ports and output ports for job processors varies depending on the application that the processor represents. Different applications require different numbers and types of input files and produce different numbers of output files, and for a particular job processor this is reflected in the ports that are present. In addition, a command line port and resource allocation port optionally can be exposed. Details of optional port configuration are provided below. Configuration involves specifying a command line for the application, providing information on job requirements in the form of the expected amount of compute resource required, and specifying how a resource allocation should be selected or created.
Figure 17 Job processor configuration
Command line
The Static radio button should be selected if the command line for the application is known before workflow runtime. In this case, simply provide the command line string in the associated text box. If the command line for the application is to be determined during workflow execution, select the Input port radio button. This will expose on the job processor an input port on which to read the command line. The name for the new input port must be provided in the associated text box.
Job requirements
Specify the compute resource that you expect the job to require in the text box provided. The expected required compute resource should be given in standard CPU seconds.
Authorisations
The authorisations section of the dialog is used to configure access control for data stagers. This is important when the workflow will be deployed to a Freefluo service. To enable a remote client of the Freefluo service to be able to read the output data stagers of the Job processor, select the checkbox labelled Open authorisation for client to read outputs. Typically, a workflow author should be careful to allow Freefluo service clients to access only the data stagers they require.
Resource allocation
Whenever a job is to be run at a GRIA server, a resource allocation must be used or created. Please see the Resource allocation section below for details.
Changing default settings
The default settings for job processors can be changed by editing the configuration file: TAVERNA_HOME/conf/gria-job-defaults.xml In addition, the default values for configuring job polling can be changed by editing: TAVERNA_HOME/conf/job-polling.properties
Download processor
The download processor is the simplest of the available processor types. It is used to download to the client machine a file that's staged at a GRIA server. The processor always has two input ports and doesn't have any output ports. The two input ports are dataStager, which should be passed the ID for the data stager at which the file is stored; and localFile, that should be passed the relative or absolute file name to use to save the file.
Freefluo processor
The Freefluo processor is used to run a workflow that's hosted by a remote Freefluo service. Before using Freefluo processors, you should discover the workflows that are hosted at a particular Freefluo server, as detailed above. After workflow discovery has been performed, a Processor will be added to the tree in the Available services panel for every discovered workflow. These processors can be added to workflows in the usual way. Some configuration of Freefluo processors is necessary. Figure 18 shows the configuration dialog for a Freefluo processor. The Service Provider section displays the location of the GRIA service provider that's hosting the workflow. The Account section is for specifying an account id. To be able to run a workflow at a remote Freefluo service, the client must have a valid account with the service provider. The account id can be provided as a static string by selecting the static radio button and entering the account id in the associated text box. Alternatively, if the account id is determined at workflow runtime, the input port radio button can be selected and the name of an input port provided in the associated text box. In this case, the account id should be passed to the Freefluo processor as an input to the input port specified in the Port name text box.
Figure 18 Freefluo processor configuration Note that just like Job Processors, the number and types of input and output ports may be different for each Freefluo processor. Inputs and outputs are determined by the interface of the remote workflow that's hosted at the Freefluo service.
Resource allocation
When creating a new data stager for an upload processor or when running an application for a job processor, a resource allocation must be selected or created. Therefore, Resource allocation configuration is relevant for both Upload processors and Job Processors. There are three options concerning Resource allocation usage, as seen in the figure below and described in the paragraphs that follow.
Figure 19 Resource allocation options
Use existing
Firstly, the Use existing radio button can be selected such that a resource allocation that exists before processor execution is used. When using this option, the associated Settings button can be selected to display a dialog for specifying how the resource allocation should be selected. The dialog can be seen in Figure 20. An existing resource alloction can be specified by ID, alias, input port or can be selected automatically. If the Input port option is used, a name for the new input port should be provided in the associated text box. This port will be used at processor runtime to read the ID of the resource allocation. If the Automatically select a resource allocation radio button is selected, all appropriate resource allocations - i.e. those held at the GRIA server - will be tried for the data upload or to start job execution, until this is successfully achieved or there are no more allocations left to try.
Figure 20 Selecting an existing resource allocation
Create new
The second option (Figure 19) is to select Create new. In this case the associated Settings button should be used to display a dialog (Figure 21) for providing requirements and other settings for the new resource allocation.
Figure 21 Creating a new resource allocation The Naming section of the dialog is used to optionally provide a human readable name or alias for the new resource allocation. This can be useful if you intend the allocation to outlive the workflow and to be accessible using other tools, such as the GRIA command line client. The Requirements section is used to request the resources that are expected to be required. Please refer to the GRIA documentation for details of resource allocation requirements and their specification. However, in brief, this section allows you to specify the amount of compute resource (standard CPU seconds) you require, some machine constraints (performance & memory), data transfer and storage requirements and the time for which you wish the resource allocation to exist. The Account section of the dialog (Figure 19) should be used to configure which account the new resource allocation should be charged to. The account can be specified by ID or alias. Alternatively, the Automatically select an account option can be used to search the GRIA state file at processor runtime for the first account that's held with the GRIA server of interest. The Life time section of the dialog (Figure 19) is used to specify whether the new resource allocation should be persistent and outlive the workflow or should be transient and terminate when the workflow finishes.
Use existing, if possible. Otherwise, create new.
The third option (Figure 19) is to select Use existing, if possible. Otherwise, create new. This option is useful if you wish to try to use an existing resource allocation but wish to create a new resource allocation should this fail. If this option is used, the processor will first try to use an existing resource allocation according to the settings specified in the Settings for existing allocation dialog of Figure 20. If this is unsuccessful and the error is recoverable, the processor will create a new resource allocation according to the settings in the Settings for new resource allocation dialog of Figure 21.
UploadFromString processor
The UploadFromString processor is used to transfer data from the client to a GRIA Server. The data to upload must be passed as a string to the processor. This is useful for transferring relatively small amounts of data. If the size of the data is significant, it may be more appropriate to use the Upload processor. The diagram below shows an UploadFromString processor, as viewed in the Workflow diagram panel in Taverna. The data to upload should be passed to the stringData input port. Configuration is identical to the Upload processor. Just like the Upload processor, the UploadFromString processor always has a single output port named dataStager that will make available the ID for the data stager at which the file was stored.
Figure 22 UploadFromString processor
DownloadToString processor
The DownloadToString processor is used to transfer data from a GRIA server to the client. The downloaded data is made available as a string by the processor. This is useful for transferring relatively small amounts of data without having to use files on the file system. If the size of the data is significant, it may be more appropriate to use the Download processor, which writes the downloaded data to a file and therefore uses less memory. The diagram below shows a DownloadToString processor, as viewed in the Workflow diagram panel in Taverna. The ID of a GRIA data stager should be passed to the dataStager input port. After execution, the downloaded data is made available as a string on the fileAsString output port.
Figure 23 DownloadToString processor
Log files
Log files for plugins can be found in the file: TAVERNA_HOME/gria-plugins/logs/plugins.log |
| © University of Southampton IT Innovation Centre, 2005 |