Metadata-Version: 2.4
Name: bulk_restore_tool
Version: 1.7.14
Summary: Code42 Bulk Restore Tool
Project-URL: Documentation, 
Project-URL: Issues, 
Project-URL: Source, 
Author-email: Code42 Software <integrations@code42.com>
License-Expression: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Requires-Python: >=3.9
Requires-Dist: aiofiles
Requires-Dist: click
Requires-Dist: httpx
Requires-Dist: ijson
Requires-Dist: pydantic[dotenv]==1.*
Requires-Dist: rich
Requires-Dist: sqlite-utils
Description-Content-Type: text/markdown

# Bulk Restore Tool (BRT)


The Bulk Restore Tool is a python library and command-line utility that enables efficient restoring of the contents of
a Code42 preservation archive without the need to install a Code42 agent on the target machine.

The tool is published to an AWS S3 bucket served by a Cloudfront distribution @ https://pypi.us.code42.com

It can be installed by running:

```bash
python3 -m pip install --extra-index-url https://pypi.us.code42.com bulk-restore-tool
```

# Performing Bulk Restores

## Defining a job

The bulk restore tool creates and runs "restore jobs". A job is a collection of archives to restore from and the file
selection criteria applied to those archives.

Each job has a name associated with it (if not provided by the user, the tool will generate a new UUID as the "name"
automatically), this name will show in Code42 Audit log entries for each archive restore session created.

At bare minimum, a job requires a list of archiveIds to restore, and the default file selection will apply:

Minimal job as a .json file input:
```json
{
  "archives": [
    {"archiveId":  "1234"}
  ]
}
```

Minimal job created in Python:
```python
from bulk_restore_tool import JobDefinition

job = JobDefinition(archives=[{"archiveId":  "1234"}])
```

The optional parameters for a job are as follows:

- **jobId**: `str` Identifier for job (will show in Audit Log entries). If not provided a new UUID will be generated.
- **type**: `Optional[str]` Optional indicator of how the archives in the job were selected, i.e. by archive, device, username, or legal hold ID.
- **identifier**: `Optional[str]` Optional indicator of what identifiers were provided of the given `type`.
- **selection**: `FileSelection` Selection criteria for what files to restore from the archive.
- **targetFolder**: `Path` Directory where the job metadata and restored files should be saved to (defaults to current working directory).
- **zipResults**: `bool` Indicates if the restored files should be compressed to a zip archive per device in the job.

Using the `brt` command-line tool, you can also create jobs easily in the terminal. The `brt create-job` command accepts
the following types of identifiers to build the list of archives automatically for you: 

- archive
- device
- username
- legalHold

For example, to restore all the files for user `john@example.com`, who has 3 Code42 devices, each backing up to dual
destinations (so 2 archives per device), run the following:

```bash
brt create-job --type username john@example.com
```

A job definition .json file will be created and populated with the default file selection and include all the archives
owned by `john@example.com`.

## Running a job

Once a Job Definition has been created, you can run the job either from the command-line (requires a .json file of the
job definition):

```bash
brt start-job restore_<job_name>.json
```

Or within a Python script with the `.start()` method of the `RestoreJob` class:

```python
from bulk_restore_tool import JobDefinition, RestoreJob

definition = JobDefinition(
    jobId="my_job",
    archives=[{"archiveId": "1234"}],
    selection={"includeDeleted": True},
)
job = RestoreJob(definition=definition)

# NOTE: if no credential args are passed to the `RestoreJob` constructor, credentials will be attempted to be read
# from the shell environment variables:
# - BRT_URL
# - BRT_API_CLIENT_ID
# - BRT_API_CLIENT_SECRET
#
# Otherwise construct the `RestoreJob` class passing authentication parameters directly.
# job = RestoreJob(definition=definition, url="<url>", api_client_id="<api_client>", api_client_secret="<secret>")

job.start()
```

## Job Metadata

Before beginning a restore, a job needs to be prepared by fetching the archive metadata (which it writes to .json files
for each archive in the target directory), and processing that metadata for the selection criteria.

For each device, a sqlite database will be created and all file records from any archives that device owns will be stored
in the database's `file` table. The bulk restore tool then applies the file selection filters, setting the `file.status`
column to either "SELECTED" or "EXCLUDED" (the default file selection is to include everything).

Metadata is automatically prepared when running the `brt create-job` command, unless the `--no-calculate` option is
provided. When creating a job from a Python script, `.start()` will prepare the metadata automatically if metadata files
don't yet exist in the target directory. But you can fetch metadata without starting the job by calling the 
`RestoreJob.prepare()` method.


## Restore Client

The `bulk_restore_tool` package also exposes a helper client for making some restore-related API calls directly.

```python
from bulk_restore_tool import Client, JobDefinition
from bulk_restore_tool.models import Archive

# NOTE: if no credential args are passed to the `Client` constructor, credentials will be attempted to be read
# from the shell environment variables
client = Client(url="<url>", api_client_id="<api_client>", api_client_secret="<secret>")

# get lists of archives (that can be used directly in a `JobDefinition`:
user_archives = client.get_archive_details(id_type="username", identifier="user@example.com")
device_archives = client.get_archive_details(id_type="device", identifier="<device_guid>")
legal_hold_archives = client.get_archive_details(id_type="legalHold", identifier="<legal_hold_Uid>")
all_archives = user_archives + device_archives + legal_hold_archives
definition = JobDefinition(archives=all_archives)


# get archive metadata for a single archive:
user_archive: Archive = user_archives[0]
response = client.get_archive_metadata(archive=user_archive, includeDeleted=True)


# restore a single fileId:
sessionId = client.create_bulk_restore_session(archive=user_archive, name="File Demo")
response = client.get_file(archive=user_archive, fileId="abcd1234", versionTimestamp=1677884124339)
with open("restored_file", "wb") as file:
    file.write(response.content)
```

## Implement a custom restore writer

The `RestoreJob` class uses the `bulk_restore_tool.RestoreWriter` interface when deciding what to do with the restored
file data. To customize how and where files are written, you can implement a subclass of `RestoreWriter` and pass an
instance of it as the `writer` parameter when constructing the `RestoreJob`.

The interface definition:

```python
import abc
from bulk_restore_tool.models import RestorableFile, LinuxMetadata, MacMetadata, WindowsMetadata

class RestoreWriter(abc.ABC):
    """
    Interface for enabling flexible restore targets.
    """

    @abc.abstractmethod
    async def write_chunk(self, file: RestorableFile, chunk: bytes, forceWinPathSanitization: bool):
        """
        The restore job will asynchronously download multiple files, so the writer needs to be able to asynchronously
        handle/process individual chunks. The `file` param provides any necessary file metadata (path, checksum, size,
        etc.) needed for processing the chunk.
        """
        ...
    
    @abc.abstractmethod
    async def write_metadata(self, file: RestorableFile, metadata: Union[LinuxMetadata, WindowsMetadata, MacMetadata]):
        """
        Provides detailed file metadata that can either be applied directly to the restored files on disk, or saved
        elsewhere as a reference. This method is called after all `.write_chunk()` calls are made for the file contents.
        """
        ...

    @abc.abstractmethod
    async def file_completed(self, file: RestorableFile, forceWinPathSanitization: bool):
        """
        Once the download is complete, the restore job will call this method indicating to the writer that there
        are no more bytes to write for the given file.
        """
        ...

    @abc.abstractmethod
    async def job_completed(self):
        """
        This method is called once all files have been downloaded by the job, allowing the writer to complete
        any final processing/cleanup before the BRT exits.
        """
        ...
```