CS 5412 LECTURE 9 MACHINE LEARNING FOR SMART

  • Slides: 47
Download presentation
CS 5412 / LECTURE 9 MACHINE LEARNING FOR SMART FARMS Ken Birman Spring, 2019

CS 5412 / LECTURE 9 MACHINE LEARNING FOR SMART FARMS Ken Birman Spring, 2019 HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 1

WE HEARD ABOUT AIR TRAFFIC CONTROL… Can we apply our insights in other settings?

WE HEARD ABOUT AIR TRAFFIC CONTROL… Can we apply our insights in other settings? Let’s review some smart-farming scenarios. Goal is to see if they more or less map to this model with sensors, Function Server running stateless functions, collection of snappy services that can be stateful and include machine-intelligence components. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 2

THE BIG BET: IOT CAN RESHAPE THE WAY MACHINE LEARNING IS DONE Machine learning

THE BIG BET: IOT CAN RESHAPE THE WAY MACHINE LEARNING IS DONE Machine learning for Io. T settings has demanding time deadlines not seen in traditional cloud systems. Moreover, the amount of data on the Io. T devices could be vastly more than we can hope to download. Our goal today? To understand the resulting flow of data/computing. Ø Data sets are so large in these settings that only really smart management of flows can yield a good solution. Ø This shapes a view focused on the pattern of computation in Io. T HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 3

WHY NOT STICK WITH THE CLOUD “AS IS”? Until now, big data computations have

WHY NOT STICK WITH THE CLOUD “AS IS”? Until now, big data computations have run in big “back-end” systems like the famous Map. Reduce/Hadoop framework, or highperformance supercomputers. Big data processing was mostly done in batches, offline. Io. T model demands instantaneous mobile intelligence, vision, speech understanding, control of devices. A batched, offline model won’t work. 4

TODAY: A VERY “LONG” PIPELINE Data acquisition…. Global File System… Hadoop jobs GFS Machine

TODAY: A VERY “LONG” PIPELINE Data acquisition…. Global File System… Hadoop jobs GFS Machine learning typically lives here, at the back Delay: milliseconds… Seconds…. Hours 5

NEW: MOVE ML TO THE EDGE OF THE CLOUD Data acquisition…. Global File System…

NEW: MOVE ML TO THE EDGE OF THE CLOUD Data acquisition…. Global File System… Hadoop jobs We move data classification and some aspects of learning here Delay: milliseconds… GFS ML was at the back Machine learning typically lives here, at the back Seconds…. Hours

FARMBEATS: MICROSOFT’S “THINK WITH OUR HANDS” APPROACH It is hard to just guess, so

FARMBEATS: MICROSOFT’S “THINK WITH OUR HANDS” APPROACH It is hard to just guess, so Microsoft decided to build an Io. T solution. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2019 SP 7

A PRODUCT… AND A PROCESS Like the French air traffic project, Microsoft has brilliant

A PRODUCT… AND A PROCESS Like the French air traffic project, Microsoft has brilliant technical leaders. They set out to be incremental and only create new things when needed, and to validate each step. But smart farming also pushes the envelope and challenges them to think outside the standard cloud “box”. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 8

SMART MONITORING OF CROPS Field of oats, or hay Ø How is the crop

SMART MONITORING OF CROPS Field of oats, or hay Ø How is the crop growing? Ø Are there signs of drought / insect / virus / fungal / bacterial issues? v v If so, can we diagnose the exact problem? If we can, what treatment is needed, and exactly where to apply it? Can we learn from this and improve our seed choice for next year? Where should we fertilize or irrigate? HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2019 SP 9

SMART HERD MANAGEMENT Dairy: Cow health and monitoring Ø Which way should we point

SMART HERD MANAGEMENT Dairy: Cow health and monitoring Ø Which way should we point the camera? When to take photos/video? ØHow much milk did each cow produce, and of what quality? Ø What did it eat, and how was its appetite? Ø How much time did it spend ruminating, or sleeping? Ø Which cows need routine medical attention? Ø Is a cow close to giving birth? Is it likely to need emergency help? HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2019 SP 10

SMART DAIRY Milk processing, yoghurt and cheese making Ø Must monitor temperature and p.

SMART DAIRY Milk processing, yoghurt and cheese making Ø Must monitor temperature and p. H Ø Need to sterilize properly using correct strength of product, rinse off Ø Watch for stuck or runaway fermentations Ø Check samples for unwanted bacteria, like Listeria (very dangerous!) Ø Maintain a secure and tamperproof audit trace (Block. Chain? ) HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2019 SP 11

SMART WASTE DISPOSAL What about all the runoff and farm waste? Ø Why not

SMART WASTE DISPOSAL What about all the runoff and farm waste? Ø Why not collect it, reprocess it for valuable secondary products? v Manure contains nitrogen and phosphorus can be used to create fertilizer v Waste water can be captured and used for irrigation v Undigested material can be transformed to “bio oil” by heating at high pressure v Residual material after treatment can be composted and plowed back on fields Ø Much of the problem with algae blooms could be eliminated by such steps, and farms could also earn more (or spend less) by doing HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2019 SP 12

GEEKY STUFF Recognize cow moods, relate cow emotional state to milk production Optimize drone

GEEKY STUFF Recognize cow moods, relate cow emotional state to milk production Optimize drone flights over complex terrain to “sail on the wind” & save power Develop a multispectral image analysis to interpret signs of crop damage Programming a drone to “look more closely” if needed, like underside of leaves or closeups of blighted ears of oats Machine learning to estimate crop maturity and schedule equipment for harvesting Predict the best choice of crop and the specific choice of seeds to plant next year in each parcel of a large field HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2019 SP 13

DO NO HARM! Smart farming also raises issues of privacy and security: Ø Banks

DO NO HARM! Smart farming also raises issues of privacy and security: Ø Banks and insurance companies might be eager to “see” private data Ø There are more and more laws governing food-supply auditing? Ø If farms became dependent on Io. T, how can we make the technology robust enough for a wide range of conditions (weather, dust, …) Ø Farmers aren’t hi-tech specialists. How hard will Io. T be to maintain? HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2019 SP 14

WE NEED TO DRILL DOWN ON A CONCRETE TASK REPRESENTATIVE OF THESE. In most

WE NEED TO DRILL DOWN ON A CONCRETE TASK REPRESENTATIVE OF THESE. In most of these tasks we see a shared structure: Ø Start with a problem posed in a real world, like a farm or dairy Ø Work to understand the various dimensions, especially scalability issues tied to big data. If we design without scalability in mind, our solution will fail! Ø Deploy sensors, then design a state machine that understands the sensor events, platform events, and uses functions to perform tasks. Ø Perhaps, develop new elastic -services your system will require. Ø Debug this on a real system, like Azure Io. T … not an easy job! HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2019 SP 15

CROP MONITORING Let’s focus initially on just one case: monitoring a field using drones.

CROP MONITORING Let’s focus initially on just one case: monitoring a field using drones. What major subsystems would we need? Ø Mapping system to pull up a topographical map of the field to scan Ø Basic drone flight control system to “follow” a flight plan Ø Wind sensing and mapping subsystem, to “sail on the breeze” (not fight it) Ø Image analysis: “Are these plants healthy or diseased? ” Ø Close-examination: Visit diseased plants, diagnose issue, document it. Ø Data archive: Downloads interesting images/video/etc and retains it. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2019 SP 16

A QUICK REMINDER Io. T Edge: Who needs it? Io. T Hub: Why bother?

A QUICK REMINDER Io. T Edge: Who needs it? Io. T Hub: Why bother? Functions: What a nuisance! Dump ‘em -services: No need… use existing ones There might not always be a connection to the cloud, so we run a little “micro-cloud” close to the sensors. We use the Io. T Hub to authenticate sensors, and to make outgoing TCP connections to them. Functions are “unavoidable. ” This is where Io. T events initially show up. Do use existing ones! But they may not cover the tasks your design requires. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 17

FUNCTIONS? OR -SERVICES? Recall that we have a choice: some tasks should run as

FUNCTIONS? OR -SERVICES? Recall that we have a choice: some tasks should run as state machines, keeping their state in a Azure key-value store. Other tasks should be implemented by one (or many) -Services that would understand our goals and send instructions to our drones. This would feel more like a standard “control center” approach. In a scaled-out Io. T setting, a solution needs elements of both kinds. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 18

FUNCTIONS? OR NEW SERVICES? Why is it so obvious that this isn’t a case

FUNCTIONS? OR NEW SERVICES? Why is it so obvious that this isn’t a case for a “pure function” solution? Ø What we’ve described would require an elaborate state machine. Ø It might be very hard to debug such a complex function application. Ø The logic for each state might be complicated, since everything will be event driven. Ø As we “learn current conditions” we run into a big-data problem. A function server isn’t intended for such cases. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 19

SHOULD EVERYTHING BE IN SERVICES? Historically this was a common approach: people built specialized

SHOULD EVERYTHING BE IN SERVICES? Historically this was a common approach: people built specialized control systems and viewed devices as dumb. But few have the skills to pull it off. In an Io. T setting, massive scale brings massive loads! Ø Any -services will need to be sharded, fault-tolerant, highly responsive, and may have to leverage special hardware accelerators. Ø If we think of a function layer as a kind of intelligent “cache” that can shield the -services from overload, we are approaching this the HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 20

APPROACH THIS LEADS US TO? We will use Azure functions for “lightweight” tasks and

APPROACH THIS LEADS US TO? We will use Azure functions for “lightweight” tasks and actions Ø Ideal for read-only actions like making a quick decision Ø OK for reporting events that go into some kind of record or log Ø But not for serious computing with heavy computation, big data, accelerators, or complex state machine sequences. Then build new -services for the heavy-weight tasks, like learning a new machine-learned model, or computing the optimal search path wind. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 21

THERE WON’T BE JUST ONE! Divide the set of knowledge tasks into groups. Don’t

THERE WON’T BE JUST ONE! Divide the set of knowledge tasks into groups. Don’t ask one server to do everything. Instead build distinct servers for each category of knowledge tasks. So we would want Ø One -service just for “flight planning”, or even two (one for “collision avoidance”) Ø One for “sailing on a breeze”, Ø One for “drone health management”, Ø One for “deciding which photos are worth downloading, ” Ø One for “identifying possible crop damage. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP areas. ” 22

IMPLICATIONS? Microsoft’s Io. T users will need help building new -services. By building a

IMPLICATIONS? Microsoft’s Io. T users will need help building new -services. By building a few of their own, for Farmbeats drones, the company can explore tradeoffs (like real-time, consistency, where to run the machine learning logic, what can stay in the “back” and what has to be in the edge). There also technical platform questions: networking connectivity, what to do right on the farm and what to do in the cloud, etc. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 23

REMEMBER: AMAZON ENDED UP WITH HUNDREDS OF SERVICES / WEB PAGE! Learn from others

REMEMBER: AMAZON ENDED UP WITH HUNDREDS OF SERVICES / WEB PAGE! Learn from others who have been down this path before you. The whole game centers on breaking up the task into chunks that are self-contained, but “small” in scope! If you think of this as one big monolithic task, you are certain to be doomed by the complexity of the overall undertaking! HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 24

HOW TO CREATE NEW SERVICES? We can start with Jim Gray’s suggestion: use key-value

HOW TO CREATE NEW SERVICES? We can start with Jim Gray’s suggestion: use key-value sharding from the outset. Within a shard, data will need to be replicated. This leads to what is called the “state machine replication model”, which involves Ø A group of replicas (and a membership service to track the set) Ø Each update occurs as a message delivered to all replicas Ø The updates are in the identical order Ø No matter what happens (failures, restarts) “amnesia” won’t occur. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 25

WILL THIS SCALE? Jim Gray’s analysis told us that general database transactions won’t scale.

WILL THIS SCALE? Jim Gray’s analysis told us that general database transactions won’t scale. So don’t even consider our sharded key-value service as a database. We’ll want to aim for simple key-value operations, or small computations that can somehow be made fault-tolerant and atomic without scaling issues. This was a sweet spot in Jim’s model. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 26

“ALL SHARDED, ALL THE TIME” In computing classes, we really don’t learn to compute

“ALL SHARDED, ALL THE TIME” In computing classes, we really don’t learn to compute on data that is spread over devices. Io. T data will already be sharded when it enters in the system, and all computation needs to be parallel and to keep the work sharded. Sharding is a magic formula for scaling, but how can people to learn to program in an “all-sharded, all the time” manner? HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 27

SO, BACK TO OUR FARMBEATS DRONES Message bus or queue Azure Function Server -Services:

SO, BACK TO OUR FARMBEATS DRONES Message bus or queue Azure Function Server -Services: some Azure provided, some “new” Functions: Lightweight, event-triggered programs in containers, “pay for what you use” resource model HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 28

A set of -services can own many of our other tasks, each specialized in

A set of -services can own many of our other tasks, each specialized in some sub-task. Divide the job up into distinct kinds of work! REVISIT OUR PICTURE Moment-by-moment operation of the drone is a good fit for the function programming model. Message bus or queue Azure Function Server -Services: some Azure provided, some “new” Functions: Lightweight, event-triggered programs in containers, “pay for what you use” resource model HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 29

WHERE’S THE SCALE? Our example shows just a few drones monitoring on field. But

WHERE’S THE SCALE? Our example shows just a few drones monitoring on field. But “at scale” in a full deployment, you want to imagine hundreds of thousands scanning many thousands of fields. And millions more sensors and actuators playing other roles. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 30

LET’S PEEK INSIDE A MICROSERVICE The inner structure would depend on design choices the

LET’S PEEK INSIDE A MICROSERVICE The inner structure would depend on design choices the developer would make This particular example. Multicasts for cache has a load-balancer, used invalidations, a cache layer, and a updates back-end storage layer External clients use standard RESTful RPC through a load balancer Load balancer Cache Layer Back-end Store HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 31

WHY DID THIS DERECHO PICTURE POP UP? We aren’t saying that everyone will use

WHY DID THIS DERECHO PICTURE POP UP? We aren’t saying that everyone will use Derecho. We are saying that everyone will have to create new -services and that they will often have an internal structure, like in this example. So they will need help doing that. Derecho is just one instance of a tool for helping people get these up and running. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 32

A ROLL-YOUR-OWN -SERVICE OF THIS KIND MIGHT BE HARD TO BUILD! The solution needs

A ROLL-YOUR-OWN -SERVICE OF THIS KIND MIGHT BE HARD TO BUILD! The solution needs to restart into this configuration after failures, handle process crashes or reboots of individual components. Data has to be stored and reloaded from files (or other -services) We need to manage the service in a consistent manner and program it to self-repair after a crash or disruption. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 33

WHY CAN DERECHO (AND OTHER TOOLS LIKE IT) HELP? Derecho is Cornell’s software library

WHY CAN DERECHO (AND OTHER TOOLS LIKE IT) HELP? Derecho is Cornell’s software library for automating those kinds of tasks. The design was created with “intelligent edge” use cases in mind. The developer would attach event handlers in various places, and Derecho automates the remainder of the “life cycle” This greatly simplifies the development challenge HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 34

AND WHAT ARE THOSE OTHER TOOLS? Microsoft and Amazon tend to offer them in

AND WHAT ARE THOSE OTHER TOOLS? Microsoft and Amazon tend to offer them in a form like a graphic novel. 1) A story – “Sally was facing such-and-such a challenge” 2) The visual story book – “These pictures illustrate the approach Sally used. ” 3) A template showing how to combine Azure Io. T components and how to customize them to solve Sally’s problem: “Here’s what she did. ” 4) Visual Studio or VSCode can load that template and you can then mutate it into the solution to your personal puzzle. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 35

HOW MIGHT WE TACKLE THE CASES MENTIONED EARLIER? Consider one example: “Image analysis: “Are

HOW MIGHT WE TACKLE THE CASES MENTIONED EARLIER? Consider one example: “Image analysis: “Are these plants healthy or diseased? ” How might we solve such a problem using modern machine learning? How would we turn our solution into a -service? How would a function in a function server interact with it? HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 36

Data Interpretatio Acquisition n Intelligent Action DIMENSIONS TO CONSIDER Knowledge Model In today’s machine

Data Interpretatio Acquisition n Intelligent Action DIMENSIONS TO CONSIDER Knowledge Model In today’s machine learning systems, the knowledge model is often slightly stale: it takes hours to compute and hence was created offline. But in Io. T, each situation is special: the topology of each farm is unique, today’s weather is unique, the “job” we are doing today is specialized, etc. Thus the standard approach will include real-time knowledge formation, through online data acquisition, learning and inference! HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 37

KNOWLEDGE MODEL BUILT YESTERDAY This is an example of data ideally fitted to the

KNOWLEDGE MODEL BUILT YESTERDAY This is an example of data ideally fitted to the CAP concept. We can take a key-value cache and load the model “as we access it” into the cache. The model is basically unchanging while we are using it, so we don’t need to worry about consistency issues. Even if the model is immense, by sharding it over the cache, we have space. But we may need to compute in a parallel manner to avoid centralizing the machine-learning decision step in a way that bottlenecks. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 38

NEW KNOWLEDGE GAINED AT THE FARM? These are going to be knowledge models, too.

NEW KNOWLEDGE GAINED AT THE FARM? These are going to be knowledge models, too. But they would be created dynamically and populated by rapid computational tasks running on the cloud, in the Azure Intelligent Io. T Edge. (Reminder: this is the confusing name of the new first tier. ) Requirements: Again, a way to compute in a highly parallel way, but also now to replicate the new knowledge models created by these edge learning tasks. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 39

COMPUTATIONAL PATTERNS We will spend more time on this in the last weeks of

COMPUTATIONAL PATTERNS We will spend more time on this in the last weeks of the class but clearly need to have at least some idea. Map. Reduce pattern is the most common one. Ø Some task is broken into shards and spread over N workers. Ø They each compute for a little while on a part of the job. Their outputs are (key, value) sub-results for the bigger task. Ø Then in the “reduce” step, we shuffle the sub-results to group data by key at a suitable shard, which can combine the set of values. This is the “reduce”. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 40

MAP-REDUCE PATTERN IN PICTURES. Map Leader Big sharded dataset hosted on machines that can

MAP-REDUCE PATTERN IN PICTURES. Map Leader Big sharded dataset hosted on machines that can run AI tasks Shuffle Full Shuffle is an n x n pattern: every shard sends data to every other shard! This avoids ever having all our work concentrated on any single process. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 41

AT SCALE… Io. T systems will want to implement this pattern of computation (lots

AT SCALE… Io. T systems will want to implement this pattern of computation (lots of instances of it, maybe millions running in parallel): Ø At the edge, in the -services layer Ø Information will need to be replicated on their behalf, at very high speed Ø The reduced (key-value) data will be a sharded representation of new knowledge deduced by the computation! HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 42

WE END UP WITH A HUGE DATAFLOW GRAPH Vast numbers of data sources Functions

WE END UP WITH A HUGE DATAFLOW GRAPH Vast numbers of data sources Functions used to handle simple events and absorb Heavily sharded edge -services do real-time knowledge load acquisition and decision making using ML computational HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 43 models

BIG DATA? Definitely! These arrows might carry photos or videos: megabytes or even hundreds

BIG DATA? Definitely! These arrows might carry photos or videos: megabytes or even hundreds of megabytes per “object”. Just moving the data becomes a cost concern: in the cloud, copying isn’t very fast. But recall that Derecho’s object store uses RDMA for big-data movement operations. So Derecho is an example of a viable solution. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 44

HOW MUCH CONSISTENCY? For many tasks, modern machine learning is “stochastic” meaning that the

HOW MUCH CONSISTENCY? For many tasks, modern machine learning is “stochastic” meaning that the learning algorithm converges in a non-deterministic way and could settle on any of a number of result states. Consistent replication of Io. T input is a common need, even for applications that use stochastic, noise-tolerant techniques. The reason is that “random noise” is very different from “stale or misleading input data”. Again, Derecho is a good fit to the requirement. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 45

HOW MUCH FAULT TOLERANCE? If we want Farm. Beats to be reliable, we should

HOW MUCH FAULT TOLERANCE? If we want Farm. Beats to be reliable, we should plan on “riding out” some failures. By some estimates, one failure every few hours might be common. Moreover, elasticity forces reconfigurations, like to add more servers or drop servers. So the shards and computations need to be done in a fault-tolerant and elastic manner. Derecho has built-in help for this, too. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 46

SO, BUILDER TOOLS COULD PLAY KEY ROLES! Azure Io. T and Amazon lack a

SO, BUILDER TOOLS COULD PLAY KEY ROLES! Azure Io. T and Amazon lack a tool like Derecho today. The IDE cartoon stories are very limited at this point. But Derecho itself does run on both of these platforms, and we are working with the Azure Io. T team to integrate it cleanly into their IDE environments, and maybe even to get permission to use RDMA too. In CS 5412 projects, we encourage you to work with it for any new -services you need to create. HTTP: //WWW. CS. CORNELL. EDU/COURSES/CS 5412/2018 SP 47