Data storage and file transfer¶
Data management on the DCE¶
Depending on the nodes (either GPU or CPU) you can access different mount points with read permissions.
|Node pool||Mount point||Content||Size|
|cam,tx,sh||/mounts/Datasets1||Machine learning datasets||500 GB|
|/mounts/Datasets2||Machine learning datasets||800 GB|
|/mounts/Datasets3||Machine learning datasets||500 GB|
|/mounts/Datasets4||Machine learning datasets||4 TB|
In addition to these mount points, for every job, you can access :
- your home directory which is independent on the specific node you allocate and whose content will remain untouched after your allocation ends
- a temporary drive that can be accessed through the
TMPDIRenvironment variable which is temporary and completely removed from the node at the end of your allocation.
TMPDIRis a space on a SSD local drive of the node
- a temporary large space can be accessed through the
TMPDIR_LFSenvironment variable. This is also a temporary space. This space does not exist on the
Your home directory is hosted on the network, it is permanent but its access is slow. The temporary drive is hosted on the compute node, it is temporary but with a fast access.
rsync on Linux/Mac¶
In order to transfer files from the cluster or to the cluster, you can use the shell command
scp from your computer.
The following command copies the directory my_dir on the home directory of the user username.
user@mycomputer:~$ ls my_dir/ file01.txt file02.txt user@mycomputer:~$ scp -r my_dir email@example.com:~/ Password: file01.txt 100% 132KB 132.3KB/s 00:00 file01.txt 100% 132KB 132.3KB/s 00:00 user@mycomputer:~$
The following command makes a local copy of the directory my_dir from the homedir of the user username.
user@mycomputer:~$ scp -r firstname.lastname@example.org:~/mydir . Password: user@mycomputer:~$
To transfer bigger files (when the transfer is long enough), you should use the shell command
rsync with the options
--partial --progress. The
--partial option allows rsync to keep partially transferred files and thus, in case an error occurs, to restart the tranfer from the partial files and not from scratch. The
--progress tells rsync to print information showing the progress of the transfer.
The following command copies the directory my_dir on the workdir of the user username.
user@mycomputer:~$ ls my_dir/ file01.txt file02.txt user@mycomputer:~$ rsync --partial -r my_dir email@example.com:~ Password: sending incremental file list my_dir/ my_dir/file01.txt 10,737,418,240 100% 121.84MB/s 0:01:24 (xfr#1, to-chk=1/3) my_dir/file02.txt 10,737,418,240 100% 111.19MB/s 0:01:32 (xfr#2, to-chk=0/3) user@mycomputer:~$