COMPLEXTRACTOR S ADVANCED EXTRACTORS TYPES OF EXTRACTORS PYCLOWDER

  • Slides: 15
Download presentation
COMPLEXTRACTOR S! (ADVANCED EXTRACTORS)

COMPLEXTRACTOR S! (ADVANCED EXTRACTORS)

TYPES OF EXTRACTORS (PYCLOWDER 2) • trigger on a single file • can specify

TYPES OF EXTRACTORS (PYCLOWDER 2) • trigger on a single file • can specify type • trigger on dataset change • add/remove file • cannot specify type (yet) • trigger on metadata change • add/remove metadata • trigger on custom combos • extractors-rulechecker • is really dataset > file added • have a PSQL db at your disposal

TRIGGER ON SINGLE FILE • check_message() resource • process_message() resource • includes locally accessible

TRIGGER ON SINGLE FILE • check_message() resource • process_message() resource • includes locally accessible path and pointer to dataset

TRIGGER ON DATASET CHANGE • check_message() resource • process_message()

TRIGGER ON DATASET CHANGE • check_message() resource • process_message()

TRIGGER ON DATASET CHANGE CAVEATS • receives message every time a file is added

TRIGGER ON DATASET CHANGE CAVEATS • receives message every time a file is added to a dataset • if extractor requires input files A, B, C, D, E then you will see at least 5 messages as those are added • have to make sure you don’t truly process until all 5 are present! • the list of files is fetched after message, not as part of it - it will have files that were added later! • we can account for this. . .

TRIGGER ON DATASET CHANGE CHECK IF THIS MESSAGE IS THE LATEST ONE

TRIGGER ON DATASET CHANGE CHECK IF THIS MESSAGE IS THE LATEST ONE

TRIGGER ON METADATA CHANGE • can trigger on both files and datasets • includes

TRIGGER ON METADATA CHANGE • can trigger on both files and datasets • includes the metadata that was added or removed • useful for extractors that react to updated metadata parameters

TRIGGER ON CUSTOM COMBOS EXTRACTORS-RULECHECKER • rules. py - actual rule functions to execute,

TRIGGER ON CUSTOM COMBOS EXTRACTORS-RULECHECKER • rules. py - actual rule functions to execute, akin to check_message() • rule_extractor_map. json - mapping of rule functions to which extractors they trigger • accumulate information across datasets into a DB • when criteria are met, trigger a target extractor directly • synthesize two distinct sensor datasets by timestamp • extractors at the collection level - e. g. stitching a day of images • sufficient bounding box coverage of target area is achieved

TRIGGER ON CUSTOM COMBOS RULE_EXTRACTOR_MAP. JSON

TRIGGER ON CUSTOM COMBOS RULE_EXTRACTOR_MAP. JSON

TRIGGER ON CUSTOM COMBOS RULE_UTILS. PY • get. Files. By. Type(resource, extension) • check.

TRIGGER ON CUSTOM COMBOS RULE_UTILS. PY • get. Files. By. Type(resource, extension) • check. Required. Files(resource, rulemap, extractor) • submit. Progress. To. DB(rule, extractor, target_output, id_list, jsonobj) • target_output is just a unique key name • id_list is a list of UUIDs you care about • jsonobj is whatever • retrieve. Progress. From. DB(target_output)

TRIGGER ON CUSTOM COMBOS RULES. PY

TRIGGER ON CUSTOM COMBOS RULES. PY

TRIGGER ON CUSTOM COMBOS RULES. PY

TRIGGER ON CUSTOM COMBOS RULES. PY

es ag e s 90 s mus es s m audiobook generator queue 90

es ag e s 90 s mus es s m audiobook generator queue 90 es s ag e s s 90 m es s ag e es s m music synthesizer queue 90 m ima m m es image combiner queue 90 Rabbit. MQ 90 90 files uploaded - 15 image pairs (30) - 15 music sample sets (30) - 15 audio recordings (30) sa g es es TRIGGER ON CUSTOM COMBOS NO RULECHECKER audio

ima s es sa ge s mus 30 rulechecker queue m ag e audio

ima s es sa ge s mus 30 rulechecker queue m ag e audio m synthesizer rule audiobook rule es sa ge s combiner rule 30 m es s Rabbit. MQ 90 90 files uploaded - 15 image pairs (30) - 15 music sample sets (30) - 15 audio recordings (30) 30 m es sa ge s TRIGGER ON CUSTOM COMBOS YES RULECHECKER

FUTURE STUFF • unify file and dataset extractor concepts • return all necessary dataset

FUTURE STUFF • unify file and dataset extractor concepts • return all necessary dataset info with file extraction • add UI element for datasets for extraction events • support collection-level extractors more explicitly • Clowder as workflow engine. . . ? HOW FAR IS TOO FAR