Galaxy dataset deletion
Data deletion
Managing library datasets is a bit complex, so here is a scenario that hopefully provides clarification. The complexities of handling library datasets is mostly contained in the delete_datasets() method in cleanup_datasets.py script.
Assume we have 1 library dataset with: LibraryDatasetDatasetAssociation -> LibraryDataset and Dataset. At this point, we have the following database column values:
LibraryDatasetDatasetAssociation deleted: False LibraryDataset deleted: False, purged: False Dataset deleted: False purged: False
1. A user deletes the assumed dataset above from a data library via a UI menu option. This action results in the following database column values (changes from previous step marked with *):
LibraryDatasetDatasetAssociation deleted: False LibraryDataset deleted: True*, purged: False Dataset deleted: False, purged: False
2. After the number of days configured for the delete_datasets() method (option -6 below) have passed, execution of the delete_datasets() method results in the following database column values (changes from previous step marked with *):
LibraryDatasetDatasetAssociation deleted: True* LibraryDataset deleted: True, purged: True* Dataset deleted: True*, purged: False
3. After the number of days configured for the purge_datasets() method (option -3 below) have passed, execution of the purge_datasets() method results in the following database column values (changes from previous step marked with *):
LibraryDatasetDatasetAssociation deleted: True LibraryDataset deleted: True, purged: True Dataset deleted: True, purged: True* (dataset file removed from disk if -r flag is used)
This scenario is about as simple as it gets.
Keep in mind that a Dataset object can have many HistoryDatasetAssociations and many LibraryDatasetDatasetAssociations, and a LibraryDataset can have many LibraryDatasetDatasetAssociations.
Another way of stating it is: LibraryDatasetDatasetAssociation objects map LibraryDataset objects to Dataset objects, and Dataset objects may be mapped to History objects via HistoryDatasetAssociation objects.
References
- The source code: cleanup_datasets.py.
- Galaxy mailing list.