Skip to main content

Source OneDrive

Purpose

Defines the specific source parameters for a OneDrive connected endpoint.

This Asset can be used by:

Asset typeLink
Input ProcessorsStream Input Processor

Prerequisite

You need:

Configuration

Name & Description

Name & Description (OneDrive Source)

  • Name : Name of the Asset. Spaces are not allowed in the name.

  • Description : Enter a description.

Inheritance chain of this Asset — If this Asset extends another, the inheritance chain is shown here. Click to navigate to any parent Asset in the chain.

Asset Usage: If the Asset is used by other Assets, the Asset Usage box shows how many times this Asset is used and which parts are referencing it. Otherwise it is not shown. Click to expand and then click to follow, if any.

Required roles

In case you are deploying to a Cluster which is running (a) Reactive Engine Nodes which have (b) specific Roles configured, then you can restrict use of this Asset to those Nodes with matching roles. If you want this restriction, then enter the names of the Required Roles here. Otherwise, leave empty to match all Nodes (no restriction).

Throttling & Failure Handling

Throttling & Failure Handling for a Source

Throttling

These parameters control the maximum number of new stream creations per given time period.

Max. new streams — Maximum number of streams this source is allowed to open or process within the given time period.

Per — Time interval unit for the Max. new streams value.

info

Configuration values for this parameter depend on the use case scenario. Assuming your data arrives in low frequency cycles, these values are negligible. In scenarios with many objects arriving in short time frames, it is recommended to review and adapt the default values accordingly.

Backoff Failure Handling

These parameters define backoff timing intervals in case of failures. The system will progressively throttle down the processing cycle based on the configured minimum and maximum failure backoff boundaries.

Min. failure backoff — The minimum backoff time before retrying after a failure.

Unit — Time unit for the minimum backoff value.

Max. failure backoff — The maximum backoff time before retrying after a failure.

Unit — Time unit for the maximum backoff value.

Based on these values, the next processing attempt is delayed: starting at the minimum failure backoff interval, the wait time increases step by step up to the maximum failure backoff.

Reset after number of successful streams — Resets the failure backoff throttling after this many successful stream processing attempts.

Reset after time without failure streams — Resets the failure backoff throttling after this amount of time passes without any failures.

Unit — Time unit for the time-based backoff reset.

Whatever comes first — the stream count or the time threshold — resets the failure throttling after the system returns to successful stream processing.

Polling & Processing

Polling & Processing a Source

This source does not reflect a stream, but an object-based storage source which does not signal the existence of new objects to observers. We therefore need to define how often we want to look up (poll) the source for new objects to process.

You can choose between Fixed rate polling and Cron tab style polling.

Fixed rate

Use Fixed rate if you want to poll at constant and frequent intervals.

Polling interval [sec] — The interval in seconds at which the configured source is queried for new objects.

Cron tab

Cron tab configuration

Use Cron tab if you want to poll at specific scheduled times. The Cron tab expression follows the cron tab style convention. Learn more about crontab syntax at the Quartz Scheduler documentation.

You can also use the built-in Cron expression editor — click the calendar symbol on the right hand side:

Cron expression editor

Configure your expression using the editor. The Next trigger times display at the top helps you visualize when the next triggers will fire. Press OK to store the values.

Polling timeout

Polling timeout [sec] — The time in seconds to wait before a polling request is considered failed. Set this high enough to account for endpoint responsiveness under normal operation.

Stable time

Stable time [sec] — The number of seconds that file statistics must remain unchanged before the file is considered stable for processing. Configuring this value enables stability checks before processing.

Ordering

When listing objects from the source for processing, you can define the order in which they are processed:

  • Alphabetically, ascending
  • Alphabetically, descending
  • Last modified, ascending
  • Last modified, descending

Reprocessing mode

The Reprocessing mode setting controls how layline.io's Access Coordinator handles previously processed sources that are re-ingested.

Reprocessing mode options

  • Manual access coordinator reset — Any source element processed and stored in layline.io's history requires a manual reset in the Sources Coordinator before reprocessing occurs (default mode).
  • Automatic access coordinator reset — Allows automatic reprocessing of already processed and re-ingested sources as soon as the respective input source has been moved into the configured done or error directory.
  • When input changed — Behaves like Manual access coordinator reset, but also checks whether the source has potentially changed — i.e., the name is identical but the content differs. If the content has changed, reprocessing starts without manual intervention.

Wait for processing clearance

When Wait for processing clearance is activated, new input sources remain unprocessed in the input directory until either:

  • A manual clearance is given through Operations, or
  • A JavaScript processor executes AccessCoordinator.giveClearance(source, stream, timeout?)

OneDrive Settings

Configure the parameters for your OneDrive endpoint:

Setting (OneDrive Source)

Connection

MSGraph Connection drop-down list

Use the drop-down list to select an MS Graph Connection that should support this SharePoint configuration. If it does not exist, you need to create it first.

info

Your MS Graph Connection needs to have the following configured scope:

  • Sites.ReadWrite.All
  • Files.ReadWrite.All

Drive

The following settings define the basic location information to read OneDrive data from:

  • Drive name or ID : ID or name of the OneDrive drive you want to connect to.

Folders

This source requires the definition of Folders.

Use + ADD A FOLDER for entering the configuration details.

  • Folder setup name : Name of the Folder. Spaces are not allowed in the name.
  • Folder setup description : Enter a description.

One Folder basically consists of the definition of three different directories:

  1. Input Directory : The directory to read new files from.
  2. Done Directory : The directory to which read files are moved after reading.
  3. Error Directory : Files which caused problems during processing are moved to the Error Directory for further analysis.

In case the Source is supposed to collect data from more than one Folder structure, it is possible to add multiple Folder configurations.

Input Directory

Input Directory

  • Input Directory : The directory to read files from. The path of the directory must be accessible to the Reactive Engine trying to access this Source. You can use ${...} macros to expand variables defined in environment variables.

  • Filter regular expression : Regular expression to filter which files in the directory are pulled.

  • File prefix regular expression : A regular expression filter which is applied to the beginning of a file name. E.g. XYZ. will lead to only those files read which filename starts with XYZ followed by anything.

  • File suffix regular expression : A regular expression filter which is applied to the end of a file name. E.g. .zip will lead to only those files read which filename ends with zip preceded by anything.

  • Include sub-directories : Scan sub-directories to the input directory also.

  • Enable housekeeping : Allows to apply housekeeping rules for files within the input directory. You can configure your required options.

    Enable Housekeeping

Done Directory

Done Directory

  • Done Directory : The directory to which files are moved when fully processed. The path of the directory must be accessible to the Reactive Engine trying to access this Source. You can use ${...} macros to expand variables defined in environment variables.

  • Done prefix : Prefix to add to the filename of the processed file after move to the done directory. E.g. done_ will add the done_-prefix to the beginning of the filename when moved to the done directory.

  • Done suffix : Suffix to add to the filename of the processed file after move to the done directory. E.g. _done will add the _done-suffix to the end of the filename when moved to the done directory.

  • "File already exists"-Handling : Define your required handling in case the file already exists in the done-directory.

    File exists in done directory handling

  • Enable housekeeping : Allows to apply housekeeping rules for files within the done directory. You can configure your required options.

Enable Housekeeping

Error Directory

Error Directory

  • Error Directory : The directory to which files are moved in case of a problem with the file during processing. The path of the directory must be accessible to the Reactive Engine trying to access this Source. You can use ${...} macros to expand variables defined in environment variables.

  • Error prefix : Prefix to add to the filename of the processed file after move to the error directory. E.g. error_ will add the error_-prefix to the beginning of the filename when moved to the error directory.

  • Error suffix : Suffix to add to the filename of the processed file after move to the error directory. E.g. _error will add the _error-suffix to the end of the filename when moved to the error directory.

  • "File already exists"-Handling : Define your required handling in case the file already exists in the error-directory.

    File exists in error directly handling

  • Enable housekeeping : Allows to apply housekeeping rules for files within the error directory. You can configure your required options.

    Enable Housekeeping


Can't find what you are looking for?

Please note, that the creation of the online documentation is Work-In-Progress. It is constantly being updated. should you have questions or suggestions, please don't hesitate to contact us at support@layline.io .