Streaming Transfer

Streaming Transfer

Streaming Overview


With a streaming UID transfer, your AWS S3 bucket will be filled at regular time intervals with new and updated Match Partner UIDs coming from the cascading process.
The time interval that ID5 pushes files could change over time. Your system should be built to look for new files and process as they become available and not based on an expected interval from ID5. We recommend you look for new files every 5 minutes.
We support two methods for file delivery: Single File and Per-Partner Files. During the integration implementation, we’ll discuss with you which method is best and configure your account accordingly. You may change the file delivery method in the future if you’d like, but we can only deliver files in one method at a time.

Single File Delivery Method

Data will be delivered to a directory called /incremental at the root of your bucket with subdirectories per UTC day /YYYYMMDD, each containing files named with a timestamp for when they were run. For example:
S3://[bucket]/incremental/20181106/000000.csv
S3://[bucket]/incremental/20181106/003000.csv
...
S3://[bucket]/incremental/20181106/233000.csv
S3://[bucket]/incremental/20181107/000000.csv
S3://[bucket]/incremental/20181107/003000.csv
...

File Format

Each .csv file contains the incremental mappings between your UIDs and the requested Match Partner’s UIDs in a pipe-separated format since the last time we ran our streaming job. The first line in the file will contain the set of match partners by their Global Vendor List ID. (Available Match Partners are defined by contract between you and ID5. To change the list of Match Partners, please reach out to your ID5 representative.) Each subsequent line in the file will represent a single user based on your UID or the ID5 ID, followed by all requested Match Partners’ UIDs that had changes.
If a column is left blank (""), this does NOT mean the mapping for this user does not exist, but rather it means there is no update. You should treat these files as purely additive to your existing mappings, not as a replacement.
The first file ID5 delivers to you will not automatically contain the entire match table; instead it will contain any IDs collected/changed since the last time we ran our streaming job. If you would like to receive the entire match table, let us know.

Header Values

Column
Type
Description
Source GVLID
Integer
Your GVL ID or the ID5 GVL ID (131)
MatchPartner1 GVLID
Integer
Match Partner 1’s GVL ID
MatchPartner2 GVLID
Integer
Match Partner 2’s GVL ID (if applicable)



Row Values

Column
Type
Description
UID
String
Your UID or the ID5 ID for this user
MatchPartner1UID
String
Match Partner 1’s UID for this user, surrounded by double quotes
MatchPartner2UID
String
Match Partner 2’s UID for this user, surrounded by double quotes (if applicable)




Another way to look at this format is as follows:
[YOUR GVLID]|[GVLID1]...
"[YOUR UIDa]"|"[UID1]"...
"[YOUR UIDb]"|"[UID2]"...
"[YOUR UIDc]"|"[UID3]"...
"[YOUR UIDz]"|"[UIDn]"...
where |[GVLID1] and |[UIDn] will repeat for all Match Partners
Your code to ingest the data file should be able to handle new Match Partners or a different order of Match Partners at any time. This way, if there’s a commercial request to add more partners, we don’t need to coordinate a release to ensure your processes don’t break.

Example Output File

Assuming your GVL ID is 35 and you are matching with partners with GVL IDs 20, 45, and 109:
$ cat /incremental/20180305/003000.csv
35|20|45|109
"AAAAAA"|"111111"|"222222"|"333333"
"BBBBBB"|"444444"|"555555"|"666666"
"CCCCCC"|"777777"|""|"999999"

Per-Partner Delivery Method

In this delivery method, data will be delivered to a directory called /incremental at the root of your bucket with subdirectories broken out by Match Partner (directory names based on the partner’s Global Vendor List ID), then folders per UTC day /YYYYMMDD/, each containing files named with a timestamp throughout the day. For instance:
S3://[bucket]/incremental/[match partner 1 GVL ID]/20171106/000000.csv
S3://[bucket]/incremental/[match partner 1 GVL ID]/20171106/003000.csv
...
S3://[bucket]/incremental/[match partner 1 GVL ID]/20171106/233000.csv
S3://[bucket]/incremental/[match partner 1 GVL ID]/20171107/000000.csv
S3://[bucket]/incremental/[match partner 1 GVL ID]/20171107/003000.csv
...
S3://[bucket]/incremental/[match partner 1 GVL ID]/20171107/180000.csv
S3://[bucket]/incremental/[match partner 2 GVL ID]/20171106/000000.csv
...
S3://[bucket]/incremental/[match partner 2 GVL ID]/20171106/233000.csv
Each .csv file contains the incremental mapping between your UIDs and the requested Match Partner’s UIDs in a pipe-separated format.

The Match Partner directories will be named based on the partner’s Global Vendor List ID. Available Match Partners are defined by contract between you and ID5. To change the list of Match Partners, please reach out to your ID5 representative

File Format

Column
Type
Description
UID
String
Your UID for this user
MatchPartnerGVLID
Int
The Global Vendor List ID of the Match Partner
MatchPartnerUID
String
The Match Partner’s UID for this user

Example Output File

$ cat /incremental/matchpartner1GVLID/20180305/000300.csv
550e8400-e29b-41d4-a716-446655440000|matchpartner1GVLID|100000187421490458
d2a8378f-fe56-4ec2-96d1-3c05df02bb48|matchpartner1GVLID|1000002629570693845

Mapping Table Refreshes

In addition to the incremental updates that we push throughout the day, ID5 can also stream the full match table to you on a regular basis. This ensures a couple of things:
  1. If any data is lost during the streaming process, the full extract will recover the data, rather than waiting for a change from that user to be pushed
  2. If any users have opted out or had their mappings expire, this will allow you to remove them from your mapping tables since they will no longer be included in the full extract
If you'd wish to receive these refreshes, please check with your ID5 representative

File Location and Format

The format of the files will follow the same as the incremental updates above, depending on whether you’ve chosen Single File or Per-Partner Files. The location of the data files, though, will be different from the incremental files to allow you to have separate processing for weekly refreshes. The files will be pushed to:

Single File Location

S3://[bucket]/full-extracts/[datetime].csv

Per-Partner File Location

S3://[bucket]/full-extracts/[match partner GVLID]/[datetime].csv

Cleaning Up / Deleting Old Files

By default, ID5 does not delete any files we place in the S3 bucket. When we push files to the bucket, we perform a sync operation. This means that if you have deleted a file in the S3 bucket, but it still exists in the ID5 servers, we will push the file again to the bucket.

We keep files on our server for approximately 30 days. If your ETL process includes deleting files from the bucket, please let us know so we can work together on a solution that meets your needs.

We recommend that you only delete files > 30 days old to avoid any issues. We can also automatically delete old files from the bucket that we have already removed from our servers if you’d like.