Package

com.krux.hyperion

aws

Permalink

package aws

Visibility
  1. Public
  2. All

Type Members

  1. trait AdpAction extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Permalink
  2. trait AdpActivity extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Permalink

    AWS Data Pipeline activity objects.

    AWS Data Pipeline activity objects.

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-activities.html

  3. case class AdpCopyActivity(id: String, name: Option[String], input: AdpRef[AdpDataNode], output: AdpRef[AdpDataNode], workerGroup: Option[String], runsOn: Option[AdpRef[AdpEc2Resource]], dependsOn: Option[Seq[AdpRef[AdpActivity]]], precondition: Option[Seq[AdpRef[AdpPrecondition]]], onFail: Option[Seq[AdpRef[AdpSnsAlarm]]], onSuccess: Option[Seq[AdpRef[AdpSnsAlarm]]], onLateAction: Option[Seq[AdpRef[AdpSnsAlarm]]], attemptTimeout: Option[String], lateAfterTimeout: Option[String], maximumRetries: Option[String], retryDelay: Option[String], failureAndRerunMode: Option[String], maxActiveInstances: Option[String]) extends AdpDataPipelineAbstractObject with AdpActivity with Product with Serializable

    Permalink

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-copyactivity.html

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-copyactivity.html

    id

    The ID of the object. IDs must be unique within a pipeline definition.

    name

    The optional, user-defined label of the object. If you do not provide a name for an object in a pipeline definition, AWS Data Pipeline automatically duplicates the value of id.

    input

    The input data source.

    output

    The location for the output.

    dependsOn

    Required for AdpActivity

  4. case class AdpCsvDataFormat(id: String, name: Option[String], column: Option[Seq[String]], escapeChar: Option[String]) extends AdpDataPipelineAbstractObject with AdpDataFormat with Product with Serializable

    Permalink

    CSV Data Format

    CSV Data Format

    A comma-delimited data format where the column separator is a comma and the record separator is a newline character.

  5. case class AdpCustomDataFormat(id: String, name: Option[String], column: Option[Seq[String]], columnSeparator: String, recordSeparator: String) extends AdpDataPipelineAbstractObject with AdpDataFormat with Product with Serializable

    Permalink

    Custom Data Format

    Custom Data Format

    A custom data format defined by a combination of a certain column separator, record separator, and escape character.

  6. trait AdpDataFormat extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Permalink

    Defines AWS Data Pipeline Data Formats

    Defines AWS Data Pipeline Data Formats

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-dataformats.html

  7. trait AdpDataNode extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Permalink

    AWS Data Pipeline DataNode objects

    AWS Data Pipeline DataNode objects

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-datanodes.html

  8. abstract class AdpDataPipelineAbstractObject extends AdpObject

    Permalink
  9. trait AdpDataPipelineDefaultObject extends AdpDataPipelineAbstractObject

    Permalink

    Each data pipeline can have a default object

  10. trait AdpDataPipelineObject extends AdpDataPipelineAbstractObject

    Permalink

    The base class of all AWS Data Pipeline objects.

    The base class of all AWS Data Pipeline objects.

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-pipeline-objects.html

  11. trait AdpDatabase extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Permalink

    AWS Data Pipeline database objects.

    AWS Data Pipeline database objects.

    Ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-databases.html

  12. case class AdpDynamoDBDataExistsPrecondition(id: String, name: Option[String], tableName: String, role: String, preconditionTimeout: Option[String], maximumRetries: Option[String], onFail: Option[Seq[AdpRef[AdpAction]]], onLateAction: Option[Seq[AdpRef[AdpAction]]], onSuccess: Option[Seq[AdpRef[AdpAction]]]) extends AdpDataPipelineAbstractObject with AdpPrecondition with Product with Serializable

    Permalink

    A precondition to check that data exists in a DynamoDB table.

    A precondition to check that data exists in a DynamoDB table.

    tableName

    The DynamoDB table to check.

  13. case class AdpDynamoDBDataFormat(id: String, name: Option[String], column: Option[Seq[String]]) extends AdpDataPipelineAbstractObject with AdpDataFormat with Product with Serializable

    Permalink

    DynamoDBDataFormat

    DynamoDBDataFormat

    Applies a schema to a DynamoDB table to make it accessible by a Hive query. DynamoDBDataFormat is used with a HiveActivity object and a DynamoDBDataNode input and output. DynamoDBDataFormat requires that you specify all columns in your Hive query. For more flexibility to specify certain columns in a Hive query or Amazon S3 support, see DynamoDBExportDataFormat.

  14. case class AdpDynamoDBDataNode(id: String, name: Option[String], tableName: String, region: Option[String], dynamoDBDataFormat: Option[AdpRef[AdpDataFormat]], readThroughputPercent: Option[String], writeThroughputPercent: Option[String], precondition: Option[Seq[AdpRef[AdpPrecondition]]], onSuccess: Option[Seq[AdpRef[AdpSnsAlarm]]], onFail: Option[Seq[AdpRef[AdpSnsAlarm]]]) extends AdpDataPipelineAbstractObject with AdpDataNode with Product with Serializable

    Permalink

    DynamoDB DataNode

    DynamoDB DataNode

    tableName

    The DynamoDB table.

    region

    The AWS region where the DynamoDB table exists. It's used by HiveActivity when it performs staging for DynamoDB tables in Hive. For more information, see Using a Pipeline with Resources in Multiple Regions.

    dynamoDBDataFormat

    Applies a schema to a DynamoDB table to make it accessible by a Hive query.

    readThroughputPercent

    Sets the rate of read operations to keep your DynamoDB provisioned throughput rate in the allocated range for your table. The value is a double between .1 and 1.0, inclusively. For more information, see Specifying Read and Write Requirements for Tables.

    writeThroughputPercent

    Sets the rate of write operations to keep your DynamoDB provisioned throughput rate in the allocated range for your table. The value is a double between .1 and 1.0, inclusively. For more information, see Specifying Read and Write Requirements for Tables.

  15. case class AdpDynamoDBExportDataFormat(id: String, name: Option[String], column: Option[Seq[String]]) extends AdpDataPipelineAbstractObject with AdpDataFormat with Product with Serializable

    Permalink

    DynamoDBExportDataFormat

    DynamoDBExportDataFormat

    Applies a schema to an DynamoDB table to make it accessible by a Hive query. Use DynamoDBExportDataFormat with a HiveCopyActivity object and DynamoDBDataNode or S3DataNode input and output. DynamoDBExportDataFormat has the following benefits:

    * Provides both DynamoDB and Amazon S3 support * Allows you to filter data by certain columns in your Hive query * Exports all attributes from DynamoDB even if you have a sparse schema

  16. case class AdpDynamoDBTableExistsPrecondition(id: String, name: Option[String], tableName: String, role: String, preconditionTimeout: Option[String], maximumRetries: Option[String], onFail: Option[Seq[AdpRef[AdpAction]]], onLateAction: Option[Seq[AdpRef[AdpAction]]], onSuccess: Option[Seq[AdpRef[AdpAction]]]) extends AdpDataPipelineAbstractObject with AdpPrecondition with Product with Serializable

    Permalink

    A precondition to check that the DynamoDB table exists.

    A precondition to check that the DynamoDB table exists.

    tableName

    The DynamoDB table to check.

  17. case class AdpEc2Resource(id: String, name: Option[String], instanceType: Option[String], imageId: Option[String], role: Option[String], resourceRole: Option[String], runAsUser: Option[String], keyPair: Option[String], region: Option[String], availabilityZone: Option[String], subnetId: Option[String], associatePublicIpAddress: Option[String], securityGroups: Option[Seq[String]], securityGroupIds: Option[Seq[String]], spotBidPrice: Option[String], useOnDemandOnLastAttempt: Option[String], initTimeout: Option[String], terminateAfter: Option[String], actionOnResourceFailure: Option[String], actionOnTaskFailure: Option[String], httpProxy: Option[AdpRef[AdpHttpProxy]], maximumRetries: Option[String]) extends AdpDataPipelineAbstractObject with AdpResource with Product with Serializable

    Permalink

    An EC2 instance that will perform the work defined by a pipeline activity.

    An EC2 instance that will perform the work defined by a pipeline activity.

    instanceType

    The type of EC2 instance to use for the resource pool. The default value is m1.small.

    imageId

    The AMI version to use for the EC2 instances. For more information, see Amazon Machine Images (AMIs).

    role

    The IAM role to use to create the EC2 instance.

    resourceRole

    The IAM role to use to control the resources that the EC2 instance can access.

    keyPair

    The name of the key pair. If you launch an EC2 instance without specifying a key pair, you can't log on to it.

    region

    A region code to specify that the resource should run in a different region.

    availabilityZone

    The Availability Zone in which to launch the EC2 instance.

    subnetId

    The ID of the subnet to launch the instance into.

    associatePublicIpAddress

    Indicates whether to assign a public IP address to an instance. An instance in a VPC can't access Amazon S3 unless it has a public IP address or a network address translation (NAT) instance with proper routing configuration. If the instance is in EC2-Classic or a default VPC, the default value is true. Otherwise, the default value is false.

    securityGroups

    The names of one or more security groups to use for the instances in the resource pool. By default, Amazon EC2 uses the default security group.

    securityGroupIds

    The IDs of one or more security groups to use for the instances in the resource pool. By default, Amazon EC2 uses the default security group.

    spotBidPrice

    The Spot Instance bid price for Ec2Resources. The maximum dollar amount for your Spot Instance bid and is a decimal value between 0 and 20.00 exclusive

    useOnDemandOnLastAttempt

    On the last attempt to request a resource, this option will make a request for On-Demand Instances rather than Spot. This ensures that if all previous attempts have failed that the last attempt is not interrupted in the middle by changes in the spot market. Default value is True.

    terminateAfter

    The amount of time to wait before terminating the resource.

    actionOnResourceFailure

    Action to take when the resource fails.

    actionOnTaskFailure

    Action to take when the task associated with this resource fails.

    maximumRetries

    Maximum number attempt retries on failure.

  18. case class AdpEmrActivity(id: String, name: Option[String], step: Seq[String], preStepCommand: Option[Seq[String]], postStepCommand: Option[Seq[String]], input: Option[Seq[AdpRef[AdpDataNode]]], output: Option[Seq[AdpRef[AdpDataNode]]], workerGroup: Option[String], runsOn: Option[AdpRef[AdpEmrCluster]], dependsOn: Option[Seq[AdpRef[AdpActivity]]], precondition: Option[Seq[AdpRef[AdpPrecondition]]], onFail: Option[Seq[AdpRef[AdpSnsAlarm]]], onSuccess: Option[Seq[AdpRef[AdpSnsAlarm]]], onLateAction: Option[Seq[AdpRef[AdpSnsAlarm]]], attemptTimeout: Option[String], lateAfterTimeout: Option[String], maximumRetries: Option[String], retryDelay: Option[String], failureAndRerunMode: Option[String], maxActiveInstances: Option[String]) extends AdpDataPipelineAbstractObject with AdpActivity with Product with Serializable

    Permalink

    Runs an Amazon EMR job.

    Runs an Amazon EMR job.

    AWS Data Pipeline uses a different format for steps than Amazon EMR, for example AWS Data Pipeline uses comma-separated arguments after the JAR name in the EmrActivity step field.

    step

    One or more steps for the cluster to run. To specify multiple steps, up to 255, add multiple step fields. Use comma-separated arguments after the JAR name; for example, "s3://example-bucket/MyWork.jar,arg1,arg2,arg3".

    preStepCommand

    Shell scripts to be run before any steps are run. To specify multiple scripts, up to 255, add multiple preStepCommand fields.

    postStepCommand

    Shell scripts to be run after all steps are finished. To specify multiple scripts, up to 255, add multiple postStepCommand fields.

    input

    The input data source.

    output

    The location for the output

    runsOn

    The Amazon EMR cluster to run this cluster.

  19. class AdpEmrCluster extends AdpDataPipelineAbstractObject with AdpResource

    Permalink

    Represents the configuration of an Amazon EMR cluster.

    Represents the configuration of an Amazon EMR cluster. This object is used by EmrActivity to launch a cluster.

  20. case class AdpEmrConfiguration(id: String, name: Option[String], classification: Option[String], property: Option[Seq[AdpRef[AdpProperty]]], configuration: Option[Seq[AdpRef[AdpEmrConfiguration]]]) extends AdpDataPipelineAbstractObject with AdpDataPipelineObject with Product with Serializable

    Permalink
  21. case class AdpExistsPrecondition(id: String, name: Option[String], role: String, preconditionTimeout: Option[String], maximumRetries: Option[String], onFail: Option[Seq[AdpRef[AdpAction]]], onLateAction: Option[Seq[AdpRef[AdpAction]]], onSuccess: Option[Seq[AdpRef[AdpAction]]]) extends AdpDataPipelineAbstractObject with AdpPrecondition with Product with Serializable

    Permalink

    Checks whether a data node object exists.

  22. class AdpHadoopActivity extends AdpDataPipelineAbstractObject with AdpActivity

    Permalink
  23. class AdpHiveActivity extends AdpDataPipelineAbstractObject with AdpActivity

    Permalink

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-hiveactivity.html

  24. case class AdpHiveCopyActivity(id: String, name: Option[String], filterSql: Option[String], generatedScriptsPath: Option[String], input: Option[AdpRef[AdpDataNode]], output: Option[AdpRef[AdpDataNode]], preActivityTaskConfig: Option[AdpRef[AdpShellScriptConfig]], postActivityTaskConfig: Option[AdpRef[AdpShellScriptConfig]], workerGroup: Option[String], runsOn: Option[AdpRef[AdpEmrCluster]], dependsOn: Option[Seq[AdpRef[AdpActivity]]], precondition: Option[Seq[AdpRef[AdpPrecondition]]], onFail: Option[Seq[AdpRef[AdpSnsAlarm]]], onSuccess: Option[Seq[AdpRef[AdpSnsAlarm]]], onLateAction: Option[Seq[AdpRef[AdpSnsAlarm]]], attemptTimeout: Option[String], lateAfterTimeout: Option[String], maximumRetries: Option[String], retryDelay: Option[String], failureAndRerunMode: Option[String], maxActiveInstances: Option[String]) extends AdpDataPipelineAbstractObject with AdpActivity with Product with Serializable

    Permalink

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-hivecopyactivity.html

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-hivecopyactivity.html

    filterSql

    A Hive SQL statement fragment that filters a subset of DynamoDB or Amazon S3 data to copy. The filter should only contain predicates and not begin with a WHERE clause, because AWS Data Pipeline adds it automatically.

    generatedScriptsPath

    An Amazon S3 path capturing the Hive script that ran after all the expressions in it were evaluated, including staging information. This script is stored for troubleshooting purposes.

    input

    The input data node. This must be S3DataNode or DynamoDBDataNode. If you use DynamoDBDataNode, specify a DynamoDBExportDataFormat.

    output

    The output data node. If input is S3DataNode, this must be DynamoDBDataNode. Otherwise, this can be S3DataNode or DynamoDBDataNode. If you use DynamoDBDataNode, specify a DynamoDBExportDataFormat.

  25. case class AdpHttpProxy(id: String, name: Option[String], hostname: Option[String], port: Option[String], username: Option[String], *password: Option[String], windowsDomain: Option[String], windowsWorkGroup: Option[String]) extends AdpDataPipelineAbstractObject with AdpDataPipelineObject with Product with Serializable

    Permalink

    hostname

    The host of the proxy which Task Runner clients use to connect to AWS services.

    port

    Port of the proxy host which the Task Runner clients use to connect to AWS services.

    username

    The username for the proxy.

    *password

    The password for proxy.

    windowsDomain

    The Windows domain name for an NTLM proxy.

    windowsWorkGroup

    The Windows workgroup name for an NTLM proxy.

  26. case class AdpJdbcDatabase(id: String, name: Option[String], connectionString: String, databaseName: Option[String], username: String, *password: String, jdbcDriverJarUri: Option[String], jdbcDriverClass: String, jdbcProperties: Option[Seq[String]]) extends AdpDataPipelineAbstractObject with AdpDatabase with Product with Serializable

    Permalink

    Defines a JDBC database.

    Defines a JDBC database.

    connectionString

    The JDBC connection string to access the database.

    jdbcDriverClass

    The driver class to load before establishing the JDBC connection.

  27. trait AdpObject extends AnyRef

    Permalink
  28. case class AdpOnDemandSchedule(id: String, name: Option[String]) extends AdpDataPipelineAbstractObject with AdpDataPipelineObject with Product with Serializable

    Permalink

    Defines the timing of a scheduled event, such as when an activity runs.

    Defines the timing of a scheduled event, such as when an activity runs.

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-schedule.html

  29. case class AdpParameter(id: String, type: String = "String", description: Option[String] = None, optional: String = "false", allowedValues: Option[Seq[String]] = None, isArray: String = "false", default: Option[String] = None) extends AdpObject with Product with Serializable

    Permalink

    AdpParameter is a pipeline parameter definition.

  30. class AdpPigActivity extends AdpDataPipelineAbstractObject with AdpActivity

    Permalink

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-pigactivity.html

  31. trait AdpPrecondition extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Permalink

    A condition that must be met before the object can run.

    A condition that must be met before the object can run. The activity cannot run until all its conditions are met.

  32. case class AdpProperty(id: String, name: Option[String], key: Option[String], value: Option[String]) extends AdpDataPipelineAbstractObject with AdpDataPipelineObject with Product with Serializable

    Permalink
  33. case class AdpRdsDatabase(id: String, name: Option[String], databaseName: Option[String], jdbcProperties: Option[Seq[String]], username: String, *password: String, rdsInstanceId: String, region: Option[String], jdbcDriverJarUri: Option[String]) extends AdpDataPipelineAbstractObject with AdpDatabase with Product with Serializable

    Permalink

    Defines an Amazon RDS database.

  34. case class AdpRecurringSchedule(id: String, name: Option[String], period: String, startAt: Option[String], startDateTime: Option[github.nscala_time.time.Imports.DateTime], endDateTime: Option[github.nscala_time.time.Imports.DateTime], occurrences: Option[String]) extends AdpDataPipelineAbstractObject with AdpDataPipelineObject with Product with Serializable

    Permalink

    Defines the timing of a scheduled event, such as when an activity runs.

    Defines the timing of a scheduled event, such as when an activity runs.

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-schedule.html

    period

    How often the pipeline should run. The format is "N [minutes|hours|days|weeks|months]", where N is a number followed by one of the time specifiers. For example, "15 minutes", runs the pipeline every 15 minutes. The minimum period is 15 minutes and the maximum period is 3 years.

    startAt

    The date and time at which to start the scheduled pipeline runs. Valid value is FIRST_ACTIVATION_DATE_TIME. FIRST_ACTIVATION_DATE_TIME is assumed to be the current date and time.

    startDateTime

    The date and time to start the scheduled runs. You must use either startDateTime or startAt but not both.

    endDateTime

    The date and time to end the scheduled runs. Must be a date and time later than the value of startDateTime or startAt. The default behavior is to schedule runs until the pipeline is shut down.

    occurrences

    The number of times to execute the pipeline after it's activated. You can't use occurrences with endDateTime.

  35. case class AdpRedshiftCopyActivity(id: String, name: Option[String], insertMode: String, transformSql: Option[String], queue: Option[String], commandOptions: Option[Seq[String]], input: AdpRef[AdpDataNode], output: AdpRef[AdpDataNode], workerGroup: Option[String], runsOn: Option[AdpRef[AdpEc2Resource]], dependsOn: Option[Seq[AdpRef[AdpActivity]]], precondition: Option[Seq[AdpRef[AdpPrecondition]]], onFail: Option[Seq[AdpRef[AdpSnsAlarm]]], onSuccess: Option[Seq[AdpRef[AdpSnsAlarm]]], onLateAction: Option[Seq[AdpRef[AdpSnsAlarm]]], attemptTimeout: Option[String], lateAfterTimeout: Option[String], maximumRetries: Option[String], retryDelay: Option[String], failureAndRerunMode: Option[String], maxActiveInstances: Option[String]) extends AdpDataPipelineAbstractObject with AdpActivity with Product with Serializable

    Permalink

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-redshiftcopyactivity.html

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-redshiftcopyactivity.html

    id

    required for AdpDataPipelineObject

    name

    required for AdpDataPipelineObject

    insertMode

    Determines what AWS Data Pipeline does with pre-existing data in the target table that overlaps with rows in the data to be loaded. Valid values are KEEP_EXISTING, OVERWRITE_EXISTING, and TRUNCATE.

    transformSql

    The SQL SELECT expression used to transform the input data.

    queue

    Corresponds to the query_group setting in Amazon Redshift, which allows you to assign and prioritize concurrent activities based on their placement in queues. Amazon Redshift limits the number of simultaneous connections to 15.

    commandOptions

    Takes COPY parameters to pass to the Amazon Redshift data node.

    input

    The input data node. The data source can be Amazon S3, DynamoDB, or Amazon Redshift.

    output

    The output data node. The output location can be Amazon S3 or Amazon Redshift.

    runsOn

    Required for AdpActivity

    dependsOn

    Required for AdpActivity

  36. case class AdpRedshiftDataNode(id: String, name: Option[String], createTableSql: Option[String], database: AdpRef[AdpRedshiftDatabase], schemaName: Option[String], tableName: String, primaryKeys: Option[Seq[String]], precondition: Option[Seq[AdpRef[AdpPrecondition]]], onSuccess: Option[Seq[AdpRef[AdpSnsAlarm]]], onFail: Option[Seq[AdpRef[AdpSnsAlarm]]]) extends AdpDataPipelineAbstractObject with AdpDataNode with Product with Serializable

    Permalink

    Defines a data node using Amazon Redshift.

    Defines a data node using Amazon Redshift.

    primaryKeys

    If you do not specify primaryKeys for a destination table in RedShiftCopyActivity, you can specify a list of columns using primaryKeys which will act as a mergeKey. However, if you have an existing primaryKey defined in a Redshift table, this setting overrides the existing key.

  37. case class AdpRedshiftDatabase(id: String, name: Option[String], clusterId: String, connectionString: Option[String], databaseName: Option[String], username: String, *password: String, jdbcProperties: Option[Seq[String]]) extends AdpDataPipelineAbstractObject with AdpDatabase with Product with Serializable

    Permalink

    Defines an Amazon Redshift database.

    Defines an Amazon Redshift database.

    clusterId

    The identifier provided by the user when the Amazon Redshift cluster was created. For example, if the endpoint for your Amazon Redshift cluster is mydb.example.us-east-1.redshift.amazonaws.com, the correct clusterId value is mydb. In the Amazon Redshift console, this value is "Cluster Name".

    connectionString

    The JDBC endpoint for connecting to an Amazon Redshift instance owned by an account different than the pipeline.

  38. case class AdpRef[+T <: AdpDataPipelineAbstractObject] extends Product with Serializable

    Permalink

    References to an existing aws data pipeline object

    References to an existing aws data pipeline object

    more details: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-pipeline-expressions.html

  39. case class AdpRegExDataFormat(id: String, name: Option[String], column: Option[Seq[String]], inputRegEx: String, outputFormat: String) extends AdpDataPipelineAbstractObject with AdpDataFormat with Product with Serializable

    Permalink

    RegEx Data Format

    RegEx Data Format

    A custom data format defined by a regular expression.

    inputRegEx

    The regular expression to parse an S3 input file. inputRegEx provides a way to retrieve columns from relatively unstructured data in a file.

    outputFormat

    The column fields retrieved by inputRegEx, but referenced as %1, %2, %3, etc. using Java formatter syntax.

  40. trait AdpResource extends AdpDataPipelineAbstractObject with AdpDataPipelineObject

    Permalink

    Defines the AWS Data Pipeline Resources

    Defines the AWS Data Pipeline Resources

    ref: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-resources.html

  41. case class AdpS3DataNode(id: String, name: Option[String], directoryPath: Option[String], filePath: Option[String], dataFormat: Option[AdpRef[AdpDataFormat]], manifestFilePath: Option[String], compression: Option[String], s3EncryptionType: Option[String], precondition: Option[Seq[AdpRef[AdpPrecondition]]], onSuccess: Option[Seq[AdpRef[AdpSnsAlarm]]], onFail: Option[Seq[AdpRef[AdpSnsAlarm]]]) extends AdpDataPipelineAbstractObject with AdpDataNode with Product with Serializable

    Permalink

    You must provide either a filePath or directoryPath value.

  42. case class AdpS3KeyExistsPrecondition(id: String, name: Option[String], s3Key: String, role: String, preconditionTimeout: Option[String], maximumRetries: Option[String], onFail: Option[Seq[AdpRef[AdpAction]]], onLateAction: Option[Seq[AdpRef[AdpAction]]], onSuccess: Option[Seq[AdpRef[AdpAction]]]) extends AdpDataPipelineAbstractObject with AdpPrecondition with Product with Serializable

    Permalink

    Checks whether a key exists in an Amazon S3 data node.

    Checks whether a key exists in an Amazon S3 data node.

    s3Key

    Amazon S3 key to check for existence.

  43. case class AdpS3PrefixNotEmptyPrecondition(id: String, name: Option[String], s3Prefix: String, role: String, preconditionTimeout: Option[String], maximumRetries: Option[String], onFail: Option[Seq[AdpRef[AdpAction]]], onLateAction: Option[Seq[AdpRef[AdpAction]]], onSuccess: Option[Seq[AdpRef[AdpAction]]]) extends AdpDataPipelineAbstractObject with AdpPrecondition with Product with Serializable

    Permalink

    A precondition to check that the Amazon S3 objects with the given prefix (represented as a URI) are present.

    A precondition to check that the Amazon S3 objects with the given prefix (represented as a URI) are present.

    s3Prefix

    The Amazon S3 prefix to check for existence of objects.

  44. class AdpShellCommandActivity extends AdpDataPipelineAbstractObject with AdpActivity

    Permalink

    Runs a command on an EC2 node.

    Runs a command on an EC2 node. You specify the input S3 location, output S3 location and the script/command.

  45. case class AdpShellCommandPrecondition(id: String, name: Option[String], command: Option[String], scriptUri: Option[String], scriptArgument: Option[Seq[String]], stdout: Option[String], stderr: Option[String], role: String, preconditionTimeout: Option[String], maximumRetries: Option[String], onFail: Option[Seq[AdpRef[AdpAction]]], onLateAction: Option[Seq[AdpRef[AdpAction]]], onSuccess: Option[Seq[AdpRef[AdpAction]]]) extends AdpDataPipelineAbstractObject with AdpPrecondition with Product with Serializable

    Permalink

    A Unix/Linux shell command that can be run as a precondition.

    A Unix/Linux shell command that can be run as a precondition.

    command

    The command to run. This value and any associated parameters must function in the environment from which you are running the Task Runner.

    scriptUri

    An Amazon S3 URI path for a file to download and run as a shell command. Only one scriptUri or command field should be present. scriptUri cannot use parameters, use command instead.

    scriptArgument

    A list of arguments to pass to the shell script.

    stdout

    The Amazon S3 path that receives redirected output from the command. If you use the runsOn field, this must be an Amazon S3 path because of the transitory nature of the resource running your activity. However if you specify the workerGroup field, a local file path is permitted.

    stderr

    The Amazon S3 path that receives redirected system error messages from the command. If you use the runsOn field, this must be an Amazon S3 path because of the transitory nature of the resource running your activity. However if you specify the workerGroup field, a local file path is permitted.

  46. case class AdpShellScriptConfig(id: String, name: Option[String], scriptUri: String, scriptArgument: Option[Seq[String]]) extends AdpDataPipelineAbstractObject with AdpDataPipelineObject with Product with Serializable

    Permalink
  47. case class AdpSnsAlarm(id: String, name: Option[String], subject: String, message: String, topicArn: String, role: String) extends AdpDataPipelineAbstractObject with AdpAction with Product with Serializable

    Permalink

    Sends an Amazon SNS notification message when an activity fails or finishes successfully.

    Sends an Amazon SNS notification message when an activity fails or finishes successfully.

    subject

    The subject line of the Amazon SNS notification message. String Yes

    message

    The body text of the Amazon SNS notification. String Yes

    topicArn

    The destination Amazon SNS topic ARN for the message. String Yes

    role

    The IAM role to use to create the Amazon SNS alarm. String Yes

  48. case class AdpSqlActivity(id: String, name: Option[String], script: Option[String], scriptUri: Option[String], scriptArgument: Option[Seq[String]], database: AdpRef[AdpDatabase], queue: Option[String], workerGroup: Option[String], runsOn: Option[AdpRef[AdpEc2Resource]], dependsOn: Option[Seq[AdpRef[AdpActivity]]], precondition: Option[Seq[AdpRef[AdpPrecondition]]], onFail: Option[Seq[AdpRef[AdpSnsAlarm]]], onSuccess: Option[Seq[AdpRef[AdpSnsAlarm]]], onLateAction: Option[Seq[AdpRef[AdpSnsAlarm]]], attemptTimeout: Option[String], lateAfterTimeout: Option[String], maximumRetries: Option[String], retryDelay: Option[String], failureAndRerunMode: Option[String], maxActiveInstances: Option[String]) extends AdpDataPipelineAbstractObject with AdpActivity with Product with Serializable

    Permalink

    Runs a SQL query on a database.

    Runs a SQL query on a database. You specify the input table where the SQL query is run and the output table where the results are stored. If the output table doesn't exist, this operation creates a new table with that name.

    script

    The SQL script to run. For example:

    insert into output select * from input where lastModified in range (?, ?)

    the script is not evaluated as an expression. In that situation, scriptArgument are useful

    scriptArgument

    a list of variables for the script

    Note

    that scriptUri is deliberately missing from this implementation, as there does not seem to be any use case for now.

  49. case class AdpSqlDataNode(id: String, name: Option[String], database: AdpRef[AdpDatabase], table: String, selectQuery: Option[String], insertQuery: Option[String], precondition: Option[Seq[AdpRef[AdpPrecondition]]], onSuccess: Option[Seq[AdpRef[AdpSnsAlarm]]], onFail: Option[Seq[AdpRef[AdpSnsAlarm]]]) extends AdpDataPipelineAbstractObject with AdpDataNode with Product with Serializable

    Permalink

    Example:

    Example:

    {
      "id" : "Sql Table",
      "type" : "MySqlDataNode",
      "schedule" : { "ref" : "CopyPeriod" },
      "table" : "adEvents",
      "selectQuery" : "select * from #{table} where eventTime >= '#{@scheduledStartTime.format('YYYY-MM-dd HH:mm:ss')}' and eventTime < '#{@scheduledEndTime.format('YYYY-MM-dd HH:mm:ss')}'"
    }
  50. class AdpTerminate extends AdpDataPipelineAbstractObject with AdpAction

    Permalink

    An action to trigger the cancellation of a pending or unfinished activity, resource, or data node.

    An action to trigger the cancellation of a pending or unfinished activity, resource, or data node. AWS Data Pipeline attempts to put the activity, resource, or data node into the CANCELLED state if it does not finish by the lateAfterTimeout value.

  51. case class AdpTsvDataFormat(id: String, name: Option[String], column: Option[Seq[String]], escapeChar: Option[String]) extends AdpDataPipelineAbstractObject with AdpDataFormat with Product with Serializable

    Permalink

    A delimited data format where the column separator is a tab character and the record separator is a newline character.

    A delimited data format where the column separator is a tab character and the record separator is a newline character.

    column

    The structure of the data file. Use column names and data types separated by a space. For example:

    [ "Name STRING", "Score INT", "DateOfBirth TIMESTAMP" ]

    You can omit the data type when using STRING, which is the default. Valid data types: TINYINT, SMALLINT, INT, BIGINT, BOOLEAN, FLOAT, DOUBLE, STRING, TIMESTAMP

    escapeChar

    A character, for example "\", that instructs the parser to ignore the next character.

Value Members

  1. object AdpJsonSerializer

    Permalink

    Serializes an AWS DataPipeline object to JSON

  2. object AdpParameterSerializer

    Permalink
  3. object AdpPipelineSerializer

    Permalink
  4. object AdpRef extends Serializable

    Permalink

Ungrouped