PMLE04

50問 • 8ヶ月前

問題一覧

(Q#151) Vertex Alでモデル学習パイプラインを実行しているときに、メモリ不足エラーが原因で評価ステップが失敗していることに気づきました。現在、評価ステップに標準のEvaluator TensorFlow Extended（TFX）パイプラインコンポーネントを使用したTensorFlow Model Analysis（TFMA）を使用しています。インフラストラクチャのオーバーヘッドを最小限に抑えながら、評価品質を落とすことなくパイプラインを安定させたいと考えています。どうすればよいでしょうか？ While running a model training pipeline on Vertex Al, you discover that the evaluation step is failing because of an out-of-memory error. You are currently using TensorFlow Model Analysis (TFMA) with a standard Evaluator TensorFlow Extended (TFX) pipeline component for the evaluation step. You want to stabilize the pipeline without downgrading the evaluation quality while minimizing infrastructure overhead. What should you do?

Include the flag -runner=DataflowRunner in beam_pipeline_args to run the evaluation step on Dataflow.

(Q#152) あなたはカテゴリ入力変数を持つデータセットを使ってMLモデルを開発しています。あなたはデータの半分を訓練セットとテストセットにランダムに分割しました。トレーニングセットのカテゴリ変数にワンホットエンコーディングを適用した後、テストセットで1つのカテゴリ変数が欠落していることに気づきます。どうすべきでしょうか？ You are developing an ML model using a dataset with categorical input variables. You have randomly split half of the data into training and test sets. After applying one-hot encoding on the categorical variables in the training set, you discover that one categorical variable is missing from the test set. What should you do?

Apply one-hot encoding on the categorical variables in the test data

(Q#153) あなたは銀行に勤めており、不正検出のためのランダムフォレスト・モデルを構築している。あなたはトランザクションを含むデータセットを持っています。あなたの分類器の性能を向上させるデータ変換戦略はどれでしょうか？ You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

Oversample the fraudulent transaction 10 times.

(Q#154) 社内の様々な製品の予測をサポートする分類モデルを開発しています。モデル開発用に提供されたデータセットにはクラスの不均衡があり、偽陽性と偽陰性を最小限に抑える必要があります。モデルを適切にトレーニングするには、どのような評価指標を使用すべきでしょうか？ You are developing a classification model to support predictions for your company’s various products. The dataset you were given for model development has class imbalance You need to minimize false positives and false negatives What evaluation metric should you use to properly train the model?

F1 score

(Q#155) あなたは、300万枚のX線画像からなるデータセットで物体検出機械学習モデルをトレーニングしています。 Vertex AI Trainingを使用して、32コア、128GBのRAM、および1つのNVIDIA P100 GPUを搭載したCompute Engineインスタンス上でカスタムトレーニングアプリケーションを実行しています。モデルのトレーニングに非常に時間がかかっていることに気づきました。モデルのパフォーマンスを犠牲にすることなく、トレーニング時間を短縮したい。どうすればよいでしょうか？ You are training an object detection machine learning model on a dataset that consists of three million X-ray images, each roughly 2 GB in size. You are using Vertex AI Training to run a custom training application on a Compute Engine instance with 32-cores, 128 GB of RAM, and 1 NVIDIA P100 GPU. You notice that model training is taking a very long time. You want to decrease training time without sacrificing model performance. What should you do?

Use the tf.distribute.Strategy API and run a distributed training job.

(Q#156) 現在BigQueryに格納されている複数の構造化データセットに対して、分類ワークフローを構築する必要があります。分類は何度も実行されるため、コードを書かずに以下のステップを完了させたいとします：探索的データ分析、特徴選択、モデル構築、トレーニング、ハイパーパラメータのチューニングと提供。どうすればよいでしょうか？ You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?

Train a classification Vertex AutoML model.

(Q#157) あなたは最近、ディープラーニング・モデルを開発した。新しいモデルをテストするために、大規模なデータセットで数エポック学習させた。トレーニング実行中、トレーニング損失と検証損失がほとんど変化しないことを確認した。あなたはモデルを素早くデバッグしたい。まず何をすべきでしょうか？ You recently developed a deep learning model. To test your new model, you trained it for a few epochs on a large dataset. You observe that the training and validation losses barely changed during the training run. You want to quickly debug your model. What should you do first?

Verify that your model can obtain a low loss on a small subset of the dataset

(Q#158) あなたは、ある産業機器製造会社のデータサイエンティストです。あなたは、全工場から収集したセンサーデータに基づいて、同社の製造工場の電力消費量を推定する回帰モデルを開発しています。センサーは毎日数千万レコードを収集します。あなたは、現在の日付までに収集されたすべてのデータを使用するモデルのトレーニング実行を、毎日スケジュールする必要があります。モデルをスムーズに拡張し、最小限の開発作業で済むようにしたい。どうすればよいでしょうか？ You are a data scientist at an industrial equipment manufacturing company. You are developing a regression model to estimate the power consumption in the company’s manufacturing plants based on sensor data collected from all of the plants. The sensors collect tens of millions of records every day. You need to schedule daily training runs for your model that use all the data collected up to the current date. You want your model to scale smoothly and require minimal development work. What should you do?

Develop a regression model using BigQuery ML.

(Q#159) あなたの組織はオンライン掲示板を管理しています。数カ月前、掲示板で有害な言葉やいじめが増加していることを発見しました。あなたは、特定のコメントを有毒または有害なものとしてフラグを立てる自動テキスト分類器を導入しました。現在、一部のユーザーから、自分たちの宗教に言及した良識的なコメントが悪口として誤って分類されているという報告が寄せられています。さらに調べてみると、特定の代表的でない宗教グループに言及したコメントに対して、分類器の誤検出率が高いことがわかりました。あなたのチームには限られた予算しかなく、すでに過大な負担を強いられています。どうすべきでしょうか？ Your organization manages an online message board. A few months ago, you discovered an increase in toxic language and bullying on the message board. You deployed an automated text classifier that flags certain comments as toxic or harmful. Now some users are reporting that benign comments referencing their religion are being misclassified as abusive. Upon further inspection, you find that your classifier's false positive rate is higher for comments that reference certain underrepresented religious groups. Your team has a limited budget and is already overextended. What should you do?

Add synthetic training data where those phrases are used in non-toxic ways.

(Q#160) あなたは雑誌販売会社に勤めており、どの顧客が次年度の定期購読を更新するかを予測するモデルを構築する必要があります。御社の過去のデータをトレーニングセットとして使用し、TensorFlowモデルを作成し、Vertex AIにデプロイしました。モデルによって提供される各予測について、どの顧客属性が最も予測力があるかを判断する必要があります。何をすべきでしょうか？ You work for a magazine distributor and need to build a model that predicts which customers will renew their subscriptions for the upcoming year. Using your company’s historical data as your training set, you created a TensorFlow model and deployed it to Vertex AI. You need to determine which customer attribute has the most predictive power for each prediction served by the model. What should you do?

Use Vertex Explainable AI. Submit each prediction request with the explain' keyword to retrieve feature attributions using the sampled Shapley method.

(Q#161) あなたは製造会社のMLエンジニアです。あなたは、予知保全のユースケースのための分類モデルを作成しています。あなたは、重要な機械が今後3日以内に故障するかどうかを予測し、修理班が壊れる前に機械を修理する十分な時間を確保する必要があります。機械の定期的なメンテナンスは比較的安価ですが、故障は非常に高くつきます。あなたは，機械が故障するかどうかを予測するために，いくつかのバイナリ分類器を訓練しました．あなたは今、評価データセットで各モデルを評価しています。あなたは、モデルによってトリガーされたメンテナンスジョブの50%以上が差し迫ったマシン故障に対処することを保証しながら、検出を優先するモデルを選択したい。どのモデルを選択すべきでしょうか？ You are an ML engineer at a manufacturing company. You are creating a classification model for a predictive maintenance use case. You need to predict whether a crucial machine will fail in the next three days so that the repair crew has enough time to fix the machine before it breaks. Regular maintenance of the machine is relatively inexpensive, but a failure would be very costly. You have trained several binary classifiers to predict whether the machine will fail, where a prediction of 1 means that the ML model predicts a failure. You are now evaluating each model on an evaluation dataset. You want to choose a model that prioritizes detection while ensuring that more than 50% of the maintenance jobs triggered by your model address an imminent machine failure. Which model should you choose?

The model with the highest recall where precision is greater than 0.5.

(Q#162) scikit-learn を使用してカスタム ML モデルを構築しました。トレーニング時間が予想よりも長くなっています。モデルを Vertex AI Training に移行することを決定し、モデルのトレーニング時間を短縮したいと考えています。まず何を試すべきでしょうか？ You built a custom ML model using scikit-learn. Training time is taking longer than expected. You decide to migrate your model to Vertex AI Training, and you want to improve the model’s training time. What should you try out first?

Train your model using Vertex AI Training with CPUs.

(Q#163) あなたは小売企業のMLエンジニアです。あなたは、eコマースの顧客がカートに入れた商品に基づいて、チェックアウト時に提供するクーポンを予測するモデルを構築しました。顧客がチェックアウトに行くと、Google Cloud上でホストされているサービングパイプラインが、顧客の既存のカートと、顧客の過去の購入行動を含むBigQueryテーブルの行を結合し、それをモデルの入力として使用します。ウェブチームは、あなたのモデルが予測を返すのが遅すぎて、クーポンオファーをウェブページの残りの部分と一緒に読み込むことができないと報告しています。どのようにモデルの予測を高速化する必要がありますか？ You are an ML engineer at a retail company. You have built a model that predicts a coupon to offer an ecommerce customer at checkout based on the items in their cart. When a customer goes to checkout, your serving pipeline, which is hosted on Google Cloud, joins the customer's existing cart with a row in a BigQuery table that contains the customers' historic purchase behavior and uses that as the model's input. The web team is reporting that your model is returning predictions too slowly to load the coupon offer with the rest of the web page. How should you speed up your model's predictions?

Use a low latency database for the customers’ historic purchase behavior.

(Q#164) あなたは、Vertex AIに自動スケーリング機能を備えたMLモデルを導入し、本番環境でオンライン予測を提供している小さな企業に勤めています。現在のモデルは、1時間あたり約20の予測リクエストを受信し、平均応答時間は1秒です。新しいデータのバッチで同じモデルを再トレーニングし、現在、本番トラフィックの ~10% を新しいモデルに送信してカナリアテストを行っています。このカナリアテスト中に、新しいモデルに対する予測要求が完了するまでに30秒から180秒かかっていることに気づきました。どうすべきでしょうか？ You work for a small company that has deployed an ML model with autoscaling on Vertex AI to serve online predictions in a production environment. The current model receives about 20 prediction requests per hour with an average response time of one second. You have retrained the same model on a new batch of data, and now you are canary testing it, sending ~10% of production traffic to the new model. During this canary test, you notice that prediction requests for your new model are taking between 30 and 180 seconds to complete. What should you do?

Turn off auto-scaling for the online prediction service of your new model. Use manual scaling with one node always available.

(Q#165) BigQueryに格納された小さな公開データセットを使って、住宅価格を予測するAutoMLモデルを学習させたいとします。データを準備する必要があり、最もシンプルで効率的なアプローチを使用したいと考えています。何をすべきでしょうか？ You want to train an AutoML model to predict house prices by using a small public dataset stored in BigQuery. You need to prepare the data and want to use the simplest, most efficient approach. What should you do?

Write a query that preprocesses the data by using BigQuery and creates a new table. Create a Vertex AI managed dataset with the new table as the data source.

(Q#166) あなたは、前処理と学習ステップからなるVertex AI MLパイプラインを開発し、ステップの各セットは別々のカスタムDockerイメージ上で実行されます。あなたの組織では、ユニットテストと統合テストを実行するために、CI/CDとしてGitHubとGitHub Actionsを使用しています。モデルの再トレーニングワークフローを自動化し、手動でも、新しいバージョンのコードがメインブランチにマージされたときでも開始できるようにする必要があります。ワークフローを構築するために必要なステップは最小限に抑えたいが、同時に最大限の柔軟性も確保したい。CI/CDワークフローはどのように構成すればよいでしょうか？ You developed a Vertex AI ML pipeline that consists of preprocessing and training steps and each set of steps runs on a separate custom Docker image. Your organization uses GitHub and GitHub Actions as CI/CD to run unit and integration tests. You need to automate the model retraining workflow so that it can be initiated both manually and when a new version of the code is merged in the main branch. You want to minimize the steps required to build the workflow while also allowing for maximum flexibility. How should you configure the CI/CD workflow?

Trigger GitHub Actions to run the tests, launch a Cloud Build workflow to build custom Docker images, push the images to Artifact Registry, and launch the pipeline in Vertex AI Pipelines.

(Q#167) あなたは、顧客取引を含むデータセットを扱っています。顧客の購買行動を予測するMLモデルを構築する必要があります。モデルをBigQuery MLで開発し、オンライン予測用にクラウドストレージにエクスポートする予定です。入力データには、商品カテゴリや支払い方法など、いくつかのカテゴリ特徴が含まれています。あなたはできるだけ早くモデルをデプロイしたいと考えています。どうすればよいでしょうか？ You are working with a dataset that contains customer transactions. You need to build an ML model to predict customer purchase behavior. You plan to develop the model in BigQuery ML, and export it to Cloud Storage for online prediction. You notice that the input data contains a few categorical features, including product category and payment method. You want to deploy the model as quickly as possible. What should you do?

Use the ML.ONE_HOT_ENCODER function on the categorical features and select the encoded categorical features and non-categorical features as inputs to create your model.

(Q#168) クラウドストレージのバケットにあるラベル付き画像を含む大規模なデータセットを使用して、画像分類モデルを開発する必要があります。あなたは何をすべきでしょうか？ You need to develop an image classification model by using a large dataset that contains labeled images in a Cloud Storage bucket. What should you do?

Import the labeled images as a managed dataset in Vertex AI and use AutoML to train the model.

(Q#169) あなたはクレジットカードの不正取引を検知するモデルを開発しています。不正な取引を 1 件でも見逃すと、クレジットカード保有者に深刻な影響が及ぶ可能性があるため、検出の優先順位を決める必要があります。AutoML を使用して、ユーザーのプロファイル情報とクレジットカードの取引データを基にモデルを作成しました。最初のモデルをトレーニングした後、モデルが多くの不正取引を検出できないことに気付きました。モデルのパフォーマンスを向上させるために、AutoML の学習パラメータをどのように調整しますか? (2つ選んでください) You are developing a model to detect fraudulent credit card transactions. You need to prioritize detection, because missing even one fraudulent transaction could severely impact the credit card holder. You used AutoML to tram a model on users' profile information and credit card transaction data After training the initial model, you notice that the model is failing to detect many fraudulent transactions. How should you adjust the training parameters in AutoML to improve model performance? (Choose two.)

Decrease the score threshold., Add more positive examples to the training set

(Q#170) scikit-leam分類モデルを本番環境にデプロイする必要があります。モデルは24時間365日リクエストに対応できなければならず、本番アプリケーションには午前8時から午後7時まで毎秒数百万のリクエストが予想されます。デプロイのコストを最小限に抑える必要があります。どうすればいいでしょうか？ You need to deploy a scikit-leam classification model to production. The model must be able to serve requests 24/7, and you expect millions of requests per second to the production application from 8 am to 7 pm. You need to minimize the cost of deployment. What should you do?

Deploy an online Vertex AI prediction endpoint. Set the max replica count to 100

(Q#171) 研究者チームと協力して、金融分析のための最先端のアルゴリズムを開発します。あなたのチームは、TensorFlowで複雑なモデルを開発し、デバッグしています。デバッグのしやすさを維持しながら、モデルのトレーニング時間を短縮したいと考えています。トレーニング環境はどのように設定すべきでしょうか？ You work with a team of researchers to develop state-of-the-art algorithms for financial analysis. Your team develops and debugs complex models in TensorFlow. You want to maintain the ease of debugging while also reducing the model training time. How should you set up your training environment?

Configure a v3-8 TPU VM. SSH into the VM to train and debug the model.

(Q#172) 複数の入力パラメータを持つMLパイプラインを作成しました。異なるパラメータの組み合わせ間のトレードオフを調査したいと考えています。パラメータの選択肢は、・入力データセット・ブースティングツリー回帰器の最大ツリー深度・オプティマイザーの学習率です。F1スコア、トレーニング時間、モデルの複雑さで測定された、異なるパラメータの組み合わせにおけるパイプラインのパフォーマンスを比較する必要があります。このアプローチを再現可能にし、同じプラットフォーム上ですべてのパイプライン実行を追跡したいと考えています。どうすればよいでしょうか？ You created an ML pipeline with multiple input parameters. You want to investigate the tradeoffs between different parameter combinations. The parameter options are ・Input dataset ・Max tree depth of the boosted tree regressor ・Optimizer learning rate You need to compare the pipeline performance of the different parameter combinations measured in F1 score, time to train, and model complexity. You want your approach to be reproducible, and track all pipeline runs on the same platform. What should you do?

1. Create an experiment in Vertex AI Experiments. 2. Create a Vertex AI pipeline with a custom model training job as part of the pipeline. Configure the pipeline’s parameters to include those you are investigating. 3. Submit multiple runs to the same experiment, using different values for the parameters.

(Q#173) 本番環境で実行されている Vertex AI Model Monitoring ジョブから、トレーニングサーブスキューアラートを受信しました。より新しいトレーニングデータでモデルを再トレーニングし、Vertex AI エンドポイントにデプロイし直しましたが、まだ同じアラートを受信しています。どうすればよいでしょうか？ You received a training-serving skew alert from a Vertex AI Model Monitoring job running in production. You retrained the model with more recent training data, and deployed it back to the Vertex AI endpoint, but you are still receiving the same alert. What should you do?

Update the model monitoring job to use the more recent training data that was used to retrain the model.

(Q#174) Vertex AI を使用して、過去の取引データに基づいて自社製品の売上を予測するカスタムモデルを開発しました。近い将来、特徴量の分布と特徴量間の相関関係に変化が生じると予想しています。また、大量の予測リクエストが届くことも予想しています。Vertex AI Model Monitoring をドリフト検出に使用する予定で、コストを最小限に抑えたいと考えています。どうすればよいでしょうか？ You developed a custom model by using Vertex AI to forecast the sales of your company’s products based on historical transactional data. You anticipate changes in the feature distributions and the correlations between the features in the near future. You also expect to receive a large volume of prediction requests. You plan to use Vertex AI Model Monitoring for drift detection and you want to minimize the cost. What should you do?

Use the features and the feature attributions for monitoring. Set a prediction-sampling-rate value that is closer to 0 than 1.

(Q#175) 最近、Vertex AI にデプロイする予定の scikit-learn モデルをトレーニングしました。このモデルはオンライン予測とバッチ予測の両方をサポートします。モデル推論のために入力データを前処理する必要があります。追加コードを最小限に抑えながら、デプロイ用にモデルをパッケージ化したいと考えています。どうすればよいでしょうか？ You have recently trained a scikit-learn model that you plan to deploy on Vertex AI. This model will support both online and batch prediction. You need to preprocess input data for model inference. You want to package the model for deployment while minimizing additional code. What should you do?

1. Wrap your model in a custom prediction routine (CPR). and build a container image from the CPR local model. 2. Upload your scikit learn model container to Vertex AI Model Registry. 3. Deploy your model to Vertex AI Endpoints, and create a Vertex AI batch prediction job

(Q#176) あなたは食品会社に勤めています。会社の過去の売上データはBigQueryに保存されています。Vertex AIのカスタムトレーニングサービスを使用して、BigQueryからデータを読み取り、将来の売上を予測する複数のTensorFlowモデルをトレーニングする必要があります。モデルの実験を始める前に、多数の特徴量に対してmm-maxスケーリングとバケット化を実行するデータ前処理アルゴリズムを実装する予定です。前処理の時間、コスト、開発労力を最小限に抑えたいと考えています。このワークフローをどのように構成すればよいでしょうか？ You work for a food product company. Your company’s historical sales data is stored in BigQuery.You need to use Vertex AI’s custom training service to train multiple TensorFlow models that read the data from BigQuery and predict future sales. You plan to implement a data preprocessing algorithm that performs mm-max scaling and bucketing on a large number of features before you start experimenting with the models. You want to minimize preprocessing time, cost, and development effort. How should you configure this workflow?

Write SQL queries to transform the data in-place in BigQuery.

(Q#177) あなたは、2つのステップを含むVertex AIパイプラインを作成しました。最初のステップでは、10TBのデータを約1時間で前処理し、結果をクラウドストレージのバケットに保存します。 2番目のステップでは、処理されたデータを使用してモデルをトレーニングします。異なるアルゴリズムをテストできるように、モデルのコードを更新する必要がある。パイプラインの変更を最小限に抑えつつ、パイプラインの実行時間とコストを削減したい。どうすればよいでしょうか？ You have created a Vertex AI pipeline that includes two steps. The first step preprocesses 10 TB data completes in about 1 hour, and saves the result in a Cloud Storage bucket. The second step uses the processed data to train a model. You need to update the model’s code to allow you to test different algorithms. You want to reduce pipeline execution time and cost while also minimizing pipeline changes. What should you do?

Enable caching for the pipeline job, and disable caching for the model training step.

(Q#178) あなたは銀行に勤めています。あなたは、ローン申請書に人的審査のフラグを立てるべきかどうかを予測するカスタムモデルを作成しました。入力特徴はBigQueryテーブルに格納されています。モデルの性能は良好で、本番環境へのデプロイを計画しています。コンプライアンス要件により、モデルは予測ごとに説明を提供する必要があります。最小限の労力でこの機能をモデルコードに追加し、可能な限り正確な説明を提供したいと考えています。どうすればよいでしょうか？ You work for a bank. You have created a custom model to predict whether a loan application should be flagged for human review. The input features are stored in a BigQuery table. The model is performing well, and you plan to deploy it to production. Due to compliance requirements the model must provide explanations for each prediction. You want to add this functionality to your model code with minimal effort and provide explanations that are as accurate as possible. What should you do?

Upload the custom model to Vertex AI Model Registry and configure feature-based attribution by using sampled Shapley with input baselines.

(Q#179) あなたは最近、XGBoostを使ってPythonでモデルを訓練しました。あなたのモデル予測サービスは、Google Kubernetes Engine (GKE)クラスタ上で動作するGolangで実装されたバックエンドサービスから呼び出されます。あなたのモデルには前処理と後処理のステップが必要です。サービングタイムで実行されるように処理ステップを実装する必要があります。コードの変更とインフラのメンテナンスを最小限に抑え、モデルをできるだけ早く本番環境にデプロイしたい。どうすればいいでしょうか？ You recently used XGBoost to train a model in Python that will be used for online serving. Your model prediction service will be called by a backend service implemented in Golang running on a Google Kubernetes Engine (GKE) cluster. Your model requires pre and postprocessing steps. You need to implement the processing steps so that they run at serving time. You want to minimize code changes and infrastructure maintenance, and deploy your model into production as quickly as possible. What should you do?

Use the Predictor interface to implement a custom prediction routine. Build the custom container, upload the container to Vertex AI Model Registry and deploy it to a Vertex AI endpoint.

(Q#180) あなたは最近、Vertex AI Pipelinesでパイプラインを展開し、モデルをトレーニングしてVertex AIエンドポイントにプッシュし、リアルタイムのトラフィックに対応しました。モデルのパフォーマンスを向上させるために、パイプラインの実験と反復を続ける必要があります。 CI/CD に Cloud Build を使用する予定です。新しいパイプラインを迅速かつ容易に本番環境にデプロイしたいのですが、新しいパイプラインの実装が本番環境で壊れる可能性を最小限に抑えたいと考えています。どうすればよいでしょうか？ You recently deployed a pipeline in Vertex AI Pipelines that trains and pushes a model to a Vertex AI endpoint to serve real-time traffic. You need to continue experimenting and iterating on your pipeline to improve model performance. You plan to use Cloud Build for CI/CD You want to quickly and easily deploy new pipelines into production, and you want to minimize the chance that the new pipeline implementations will break in production. What should you do?

Set up a CI/CD pipeline that builds and tests your source code and then deploys built artifacts into a pre-production environment. After a successful pipeline run in the pre-production environment, deploy the pipeline to production.

(Q#181) あなたはデータガバナンスの要件が厳しい銀行に勤めています。最近、不正取引を検出するカスタムモデルを実装しました。プロジェクトのネットワークでホストされているAPIエンドポイントを使用して、トレーニングコードから内部データをダウンロードする必要があります。データ流出のリスクを軽減しながら、最も安全な方法でデータにアクセスする必要があります。どうすればいいでしょうか？ You work for a bank with strict data governance requirements. You recently implemented a custom model to detect fraudulent transactions. You want your training code to download internal data by using an API endpoint hosted in your project’s network. You need the data to be accessed in the most secure way, while mitigating the risk of data exfiltration. What should you do?

Enable VPC Service Controls for peerings, and add Vertex AI to a service perimeter.

(Q#182) トラフィックを提供している本番環境の Vertex Al エンドポイントに、新しいバージョンのモデルをデプロイしています。すべてのユーザートラフィックを新しいモデルに向ける予定です。アプリケーションの中断を最小限に抑えてモデルを展開する必要があります。どうすればよいでしょうか。 You are deploying a new version of a model to a production Vertex Al endpoint that is serving traffic. You plan to direct all user traffic to the new model. You need to deploy the model with minimal disruption to your application. What should you do?

1. Create a new model. Set the parentModel parameter to the model ID of the currently deployed model. Upload the model to Vertex AI Model Registry. 2. Deploy the new model to the existing endpoint, and set the new model to 100% of the traffic

(Q#183) あなたは大規模なデータセットでMLモデルをトレーニングしています。トレーニングプロセスを高速化するためにTPUを使用しています。トレーニングプロセスに予想以上の時間がかかっていることに気づいた。TPUの能力がフルに発揮されていないことがわかりました。どうすべきでしょうか？ You are training an ML model on a large dataset. You are using a TPU to accelerate the training process. You notice that the training process is taking longer than expected. You discover that the TPU is not reaching its full capacity. What should you do?

Increase the batch size

(Q#184) あなたは小売企業に勤めています。Vertex AIで管理されている表形式のデータセットには、3つの異なる店舗の売上データが含まれています。このデータセットには、店舗名や販売タイムスタンプなど、いくつかの特徴が含まれています。あなたはこのデータを使って、近々オープンする新店舗の売上予測を行うモデルを学習させたいと考えています。データを学習セット、検証セット、テストセットに分割する必要がある。どのようなアプローチでデータを分割すべきでしょうか？ You work for a retail company. You have a managed tabular dataset in Vertex AI that contains sales data from three different stores. The dataset includes several features, such as store name and sale timestamp. You want to use the data to train a model that makes sales predictions for a new store that will open soon. You need to split the data between the training, validation, and test sets. What approach should you use to split the data?

Use Vertex AI chronological split, and specify the sales timestamp feature as the time variable

(Q#185) 顧客離れを予測するBigQuery MLモデルを開発し、そのモデルをVertex AI Endpointsにデプロイしました。モデルの特徴値が変更されたときに、最小限の追加コードを使用してモデルの再トレーニングを自動化したいと考えています。また、学習コストを削減するために、モデルの再学習回数を最小限に抑えたいと考えています。どうすればよいでしょうか？ You have developed a BigQuery ML model that predicts customer churn, and deployed the model to Vertex AI Endpoints. You want to automate the retraining of your model by using minimal additional code when model feature values change. You also want to minimize the number of times that your model is retrained to reduce training costs. What should you do?

1. Create a Vertex AI Model Monitoring job configured to monitor prediction drift 2. Configure alert monitoring to publish a message to a Pub/Sub queue when a monitoring alert is detected 3. Use a Cloud Function to monitor the Pub/Sub queue, and trigger retraining in BigQuery

(Q#186) あなたはプロトタイプのコードを本番環境にデプロイする仕事を任されている。フィーチャーエンジニアリングコードはPySparkで、Dataproc Serverless上で実行されます。モデルトレーニングはVertex AIカスタムトレーニングジョブを使用して実行されます。この2つのステップは接続されておらず、現在はフィーチャーエンジニアリングのステップが終了した後にモデルのトレーニングを手動で実行する必要があります。エンドツーエンドで実行され、ステップ間の接続を追跡するスケーラブルで保守可能な本番プロセスを作成する必要があります。どうすればよいでしょうか？ You have been tasked with deploying prototype code to production. The feature engineering code is in PySpark and runs on Dataproc Serverless. The model training is executed by using a Vertex AI custom training job. The two steps are not connected, and the model training must currently be run manually after the feature engineering step finishes. You need to create a scalable and maintainable production process that runs end-to-end and tracks the connections between steps. What should you do?

Use the Kubeflow pipelines SDK to write code that specifies two components: - The first is a Dataproc Serverless component that launches the feature engineering job - The second is a custom component wrapped in the create_custom_training_job_from_component utility that launches the custom model training job Create a Vertex AI Pipelines job to link and run both components

(Q#187) あなたは最近、Vertex AI エンドポイントに scikit-learn モデルをデプロイしました。現在、本番トラフィックでモデルをテストしています。エンドポイントを監視していると、1 日を通して 1 時間あたり予想より 2 倍多いリクエストを発見しました。将来、需要が増加したときにエンドポイントを効率的に拡張し、ユーザーが高遅延を経験するのを防ぎたいと考えています。どうすればよいでしょうか？ You recently deployed a scikit-learn model to a Vertex AI endpoint. You are now testing the model on live production traffic. While monitoring the endpoint, you discover twice as many requests per hour than expected throughout the day. You want the endpoint to efficiently scale when the demand increases in the future to prevent users from experiencing high latency. What should you do?

Configure an appropriate minReplicaCount value based on expected baseline traffic

(Q#188) あなたは銀行に勤めています。あなたは、銀行のベンダーから提供されたカスタムの表形式MLモデルを持っています。学習データは、その感度のために利用できません。モデルはVertex AI Modelサービングコンテナとしてパッケージ化されており、各予測インスタンスの入力として文字列を受け入れます。各文字列では、特徴値はカンマで区切られている。オンライン予測用にこのモデルを本番環境に配備し、最小限の労力で経時的な特徴分布を監視したいとします。どうすればいいでしょうか？ You work at a bank. You have a custom tabular ML model that was provided by the bank’s vendor. The training data is not available due to its sensitivity. The model is packaged as a Vertex AI Model serving container, which accepts a string as input for each prediction instance. In each string, the feature values are separated by commas. You want to deploy this model to production for online predictions and monitor the feature distribution over time with minimal effort. What should you do?

1. Upload the model to Vertex AI Model Registry, and deploy the model to a Vertex AI endpoint 2. Create a Vertex AI Model Monitoring job with feature drift detection as the monitoring objective, and provide an instance schema

(Q#189) あなたはGoogle Cloudでバッチ推論MLパイプラインを実装しています。モデルは TensorFlow を使用して開発され、Cloud Storage に SavedModel 形式で保存されています。あなたは、BigQueryテーブルに格納されている10TBのデータを含む履歴データセットにモデルを適用する必要があります。どのように推論を実行する必要がありますか？ You are implementing a batch inference ML pipeline in Google Cloud. The model was developed using TensorFlow and is stored in SavedModel format in Cloud Storage. You need to apply the model to a historical dataset containing 10 TB of data that is stored in a BigQuery table. How should you perform the inference?

Import the TensorFlow model by using the CREATE MODEL statement in BigQuery ML. Apply the historical data to the TensorFlow model

(Q#190)最近、Vertex AI エンドポイントにモデルをデプロイしました。データが頻繁に漂流するため、要求応答ロギングを有効にし、Vertex AI Model Monitoring ジョブを作成しました。モデルが予想よりも高いトラフィックを受信していることを確認しました。ドリフトを迅速に検出しながら、モデル監視コストを削減する必要があります。どうすればよいでしょうか？ You recently deployed a model to a Vertex AI endpoint. Your data drifts frequently, so you have enabled request-response logging and created a Vertex AI Model Monitoring job. You have observed that your model is receiving higher traffic than expected. You need to reduce the model monitoring cost while continuing to quickly detect drift. What should you do?

Decrease the sample_rate parameter in the RandomSampleConfig of the monitoring job

(Q#191) あなたは小売企業に勤めています。あなたは、毎月の商品売上予測を作成するVertex AI予測モデルを作成しました。あなたは、モデルがどのように予測を計算するかを説明するのに役立つレポートを素早く作成したいと考えています。学習データセットに含まれていない、最近の実際の販売データが1ヶ月分あります。レポート用のデータはどのように作成すればよいでしょうか？ You work for a retail company. You have created a Vertex AI forecast model that produces monthly item sales predictions. You want to quickly create a report that will help to explain how the model calculates the predictions. You have one month of recent actual sales data that was not included in the training dataset. How should you generate data for your report?

Create a batch prediction job by using the actual sales data, and configure the job settings to generate feature attributions. Compare the results in the report.

(Q#192) あなたのチームには、Vertex AI エンドポイントにデプロイされたモデルがあります。モデルのトレーニングプロセスを自動化し、クラウド機能によってトリガーされるVertex AIパイプラインを作成しました。モデルを最新の状態に保つことを優先しつつ、再トレーニングのコストを最小限に抑える必要があります。再トレーニングはどのように設定すればよいでしょうか。 Your team has a model deployed to a Vertex AI endpoint. You have created a Vertex AI pipeline that automates the model training process and is triggered by a Cloud Function. You need to prioritize keeping the model up-to-date, but also minimize retraining costs. How should you configure retraining?

Enable model monitoring on the Vertex AI endpoint. Configure Pub/Sub to call the Cloud Function when feature drift is detected

(Q#193) 貴社では、顧客コールセンターにかかってきた電話の音声ファイルを、オンプレミスのデータベースに大量に保存しています。各音声ファイルはwav形式で、長さは約5分です。これらの音声ファイルを分析して、顧客の感情を分析する必要があります。 Speech-to-Text APIを使用する予定です。どうすればよいでしょうか？ Your company stores a large number of audio files of phone calls made to your customer call center in an on-premises database. Each audio file is in wav format and is approximately 5 minutes long. You need to analyze these audio files for customer sentiment. You plan to use the Speech-to-Text API You want to use the most efficient approach. What should you do?

1. Upload the audio files to Cloud Storage. 2. Call the speech:longrunningrecognize API endpoint to generate transcriptions 3. Create a Cloud Function that calls the Natural Language API by using the analyzeSentiment method

(Q#194) あなたはソーシャルメディア企業に勤めています。あなたは、ファッション・アクセサリーを識別するためのiOSモバイル・アプリケーション用に、コードなしの画像分類モデルを作成したい。クラウドストレージにラベル付きデータセットがあります。コストを最小化し、可能な限り低いレイテンシーで予測を提供する学習ワークフローを構成する必要があります。何をすべきでしょうか？ You work for a social media company. You want to create a no-code image classification model for an iOS mobile application to identify fashion accessories. You have a labeled dataset in Cloud Storage. You need to configure a training workflow that minimizes cost and serves predictions with the lowest possible latency. What should you do?

Train the model by using AutoML Edge, and export it as a Core ML model. Configure your mobile application to use the .mlmodel file directly.

(Q#195) あなたは小売企業に勤めている。あなたは、顧客がある日に商品を購入するかどうかを予測するモデルを開発するよう依頼されました。あなたのチームは会社の販売データを処理し、以下の行を持つテーブルを作成しました： Customer_id ・Product_id ・Date ・Days_since_last_purchase (日単位で測定) ・Average_purchase_frequency (1/日単位で測定) ・Purchase (バイナリ・クラス、顧客がその日に商品を購入したかどうか) あなたは、個々の予測についてモデルの結果を解釈する必要があります。何をすべきでしょうか？ You work for a retail company. You have been asked to develop a model to predict whether a customer will purchase a product on a given day. Your team has processed the company’s sales data, and created a table with the following rows: ・Customer_id ・Product_id ・Date ・Days_since_last_purchase (measured in days) ・Average_purchase_frequency (measured in 1/days) ・Purchase (binary class, if customer purchased product on the Date) You need to interpret your model’s results for each individual prediction. What should you do?

Create a Vertex AI tabular dataset. Train an AutoML model to predict customer purchases. Deploy the model to a Vertex AI endpoint and enable feature attributions. Use the “explain” method to get feature attribution values for each individual prediction.

(Q#196) あなたは、小売店のレジのライブ映像を撮影する会社に勤めています。あなたは、ライブビデオ映像を使用して、ほぼリアルタイムでサービスを待っている顧客の数を検出するモデルを構築する必要があります。あなたは、最小限の労力で迅速にソリューションを実装したいと考えています。どのようにモデルを構築すべきでしょうか？ You work for a company that captures live video footage of checkout areas in their retail stores. You need to use the live video footage to build a model to detect the number of customers waiting for service in near real time. You want to implement a solution quickly and with minimal effort. How should you build the model?

Use the Vertex AI Vision Occupancy Analytics model.

(Q#197) あなたは大手銀行でアナリストとして働いています。あなたは、いくつかの回帰モデルや分類モデルをトレームするための、堅牢でスケーラブルなMLパイプラインを開発しています。パイプラインの主な焦点はモデルの解釈可能性です。あなたはパイプラインをできるだけ早くプロダクション化したいと考えています。あなたは何をすべきでしょうか？ You work as an analyst at a large banking firm. You are developing a robust scalable ML pipeline to tram several regression and classification models. Your primary focus for the pipeline is model interpretability. You want to productionize the pipeline as quickly as possible. What should you do?

Use Tabular Workflow for TabNet through Vertex AI Pipelines to train attention-based models

(Q#198) あなたは、テキストを翻訳するためにTensorFlowでTransformerモデルを開発しました。学習データには、クラウドストレージのバケットにある数百万のドキュメントが含まれます。トレーニング時間を短縮するために分散トレーニングを使用する予定です。コードの修正とクラスタ構成の管理に必要な労力を最小限に抑えながら、トレーニングジョブを構成する必要があります。どうすればよいでしょうか？ You developed a Transformer model in TensorFlow to translate text. Your training data includes millions of documents in a Cloud Storage bucket. You plan to use distributed training to reduce training time. You need to configure the training job while minimizing the effort required to modify code and to manage the cluster’s configuration. What should you do?

Create a training job that uses Cloud TPU VMs. Use tf.distribute.TPUStrategy for distribution.

(Q#199) あなたは、カスタムモデルをトレーニングし、本番で実行するプロセスを開発しています。モデルと予測値の系譜を表示できるようにする必要があります。どうすればよいでしょうか？ You are developing a process for training and running your custom model in production. You need to be able to show lineage for your model and predictions. What should you do?

1. Use a Vertex AI Pipelines custom training job component to tram your model. 2. Generate predictions by using a Vertex AI Pipelines model batch predict component.

(Q#200) あなたはホテルで働いており、紙ベースの顧客フィードバックフォームからスキャンされた顧客のコメントを含むデータセットを持っています。どのフォームも同じレイアウトです。あなたは、各フォームの顧客コメントから総合満足度スコアを素早く予測する必要があります。このタスクをどのように達成しますか You work for a hotel and have a dataset that contains customers’ written comments scanned from paper-based customer feedback forms, which are stored as PDF files. Every form has the same layout. You need to quickly predict an overall satisfaction score from the customer comments on each form. How should you accomplish this task?

Uptrain a Document AI custom extractor to parse the text in the comments section of each PDF file. Use the Natural Language API analyzeSentiment feature to infer overall satisfaction scores.

PDE_page4

PDE_page4

PDE_page5

PDE_page5

PDE_page6

PDE_page6

PDE_page7

PDE_page7

PMLE05

PMLE05

PMLE06

PMLE06

PMLE07

PMLE07