You have been tasked with developing a prototype fraud system for a compliance team at a financial
Fantastic news! We've Found the answer you've been seeking!
Question:
anomalous events within data pushed to a topic by operational systems. These data are consumed and evaluated by your model, and responses are sent via PubSub for manual investigation by the compliance team. The results of the investigation (confirming that the point was an ‘outlier’) are returned via PubSub for integration into the model improvement process.
1. Your data science solution will reside within the Evaluate component and the operational systems have been publishing data to the metrics topic for the last few days. How do you retrieve data for evaluation (describe it without code)? (1).
2. Suppose your solution is written in Python and is running within a pod in K8S where you are currently using a synchronous pull approach to consume data. You notice that you are falling behind on processing and scale your solution horizontally by adding more processing instances (pods) and subscriptions. Unfortunately, your PubSub costs have gone up after scaling out; why is that and how would you reduce it? (3)
3. Given the simplicity of the problem, you have implemented a Sklearn/Scikit model which doesn’t require any other state when scoring (besides what is consumed on the metrics topic). To leverage the scalability of the DataFlow runner you have decided to deploy your model into a Beam pipeline.
Describe the approach (do not provide code without an explanation)? (3)
Related Book For
Systems Analysis And Design
ISBN: 978-1119496489
7th Edition
Authors: Alan Dennis, Barbara Wixom, Roberta M. Roth
Posted Date: