New features added to Bedrock
At BasisAI, our mission is to help enterprises create value using data-driven decision-making by enabling the productionising of machine learning (ML) models, swiftly and responsibly. As organisations become more data-driven, a unified enterprise grade productionisation platform can help scale their ML processes.
We’re introducing two new features to our Bedrock platform to simplify the ML project lifecycle with appropriate governance processes: Managed Notebooks and Approval Workflows.
Jupyter Notebooks is one of the most popular data science tools for machine learning experimentation and development. It is designed for fast, iterative development and will allow users the flexibility to run a block of code instead of the entire script; which makes it well suited for exploratory work. This enables data scientists to fail fast and learn fast.
Although notebooks are such an integral part of machine learning development, provisioning infrastructure for notebooks with appropriate permissions is still a complex task. This becomes a very prominent issue for larger enterprises especially when data science teams are siloed from infrastructure teams. This will mean a lengthier process to request resources to run their experiments, resulting in loss of valuable time for data science teams.
To solve this problem for enterprises, we are releasing a Managed Notebooks feature in Bedrock. Users can now spin up notebook servers under Bedrock projects using pre-built images, and run their experiments in the cloud. This gives data science teams the convenience of self-service resource provisioning in the cloud, while freeing up infrastructure teams from managing notebook servers.
Bedrock takes care of several things for you under the hood:
- Communicates with the cluster to configure and spin up JupyterLab notebook servers.
- Secures remote access to notebook servers.
- Provides visibility into available resources so that users can request various resource combinations on-demand.
- Integrates with version control systems to set up the working directory with code and credentials, so that users can start being productive immediately.
- Handles storage persistence so that the working directory is kept when users stop and restart their notebook servers.
It is also notable that notebooks run in the same environment as training pipelines, which allows users to quickly iterate on code from within the notebook without the need of checking it into a git repository and kicking off a training pipeline in Bedrock. This reduces friction when users want to productionize their notebooks by converting exploratory code into a well-organized set of scripts for more maintainable and repeatable training pipelines. This process can be tricky if notebooks are running in a different environment, since it does not only involve getting the code right, but also making sure that all the moving parts with the runtime environment and configuration (e.g. container image, libraries, credentials, resources, etc.) work together seamlessly. Our own data scientists have indeed been caught out once or twice because of different spark versions used in experimentation and production!
When setting up a new notebook on Bedrock, users can easily configure resources based on their needs. After spinning up the notebook servers, users can use the familiar Jupyterlab user interface to conduct their exploratory data analysis and experimentation work.
As machine learning models become an integral part of the business process in enterprises, it is extremely important to incorporate good governance processes for models deployed in production. A ML platform can help bake these governance processes in the ML workflow across the organization. In Bedrock, we are introducing the Approval Workflows feature to help enterprises integrate a model review process to ensure that the ML models work as intended and meet the technical and business requirements before they are deployed to production.
With this new feature, we are introducing a new project collaborator role called Review. This role enables collaborators to review every model version before it is allowed to be deployed to production.
As a recap, every Bedrock project can now have collaborators with the following roles:
- Admin: Has full access to the project.
- Train: Has permission to train and upload model versions within the project.
- Deploy: Has permission to deploy model versions as endpoints or batch scoring within the project.
- Train + Deploy: Has permission to both train and deploy.
- Review: Has permission to review the model versions.
Collaborators assigned the Train role (e.g. data scientists) are now able to request a review on a model version that they have trained. As part of this process, they select reviewers from among those with the Review role, who will get notified of the request.
Image: Screenshot of the submit review request form page
The review request page contains an automatically generated model card that pulls together information collected during the training process, e.g. data used, parameters used, training metrics, validation metrics, explainability metrics, fairness metrics. Data scientists may also use the free text input to enrich the generated fields with more information. This model card gives the reviewers a full picture of all the context surrounding the development and behaviour of this model version.
Image: Screenshot of the review request page
If the reviewers have any questions or concerns about the model version, they can start a discussion with the requester using the comment panel on the right. Once they are satisfied with the model version, the reviewers would then have to approve the review request before the model version may be deployed to production in a batch scoring job or model server.
To learn more about our platform, check out this overview of Bedrock. To see all other features and benefits that Bedrock offers, visit our product documentation, or get in touch with us to book a demo.