Data storage and file transfer¶
Data management on the DCE¶
Depending on the nodes (either GPU or CPU) you can access different mount points with read permissions.
Node pool | Mount point | Content | Size |
---|---|---|---|
cam,tx,sh | /mounts/Datasets1 | Machine learning datasets | 500 GB |
/mounts/Datasets2 | Machine learning datasets | 800 GB | |
/mounts/Datasets3 | Machine learning datasets | 500 GB | |
/mounts/Datasets4 | Machine learning datasets | 4 TB |
In addition to these mount points, for every job, you can access :
- your home directory which is independent on the specific node you allocate and whose content will remain untouched after your allocation ends
- a temporary drive that can be accessed through the
TMPDIR
environment variable which is temporary and completely removed from the node at the end of your allocation.TMPDIR
is a space on a SSD local drive of the node - a temporary large space can be accessed through the
TMPDIR_LFS
environment variable. This is also a temporary space. This space does not exist on thecam
nodes.
Your home directory is hosted on the network, it is permanent but its access is slow. The temporary drive is hosted on the compute node, it is temporary but with a fast access.
Data transfer¶
Shell commands scp
and rsync
on Linux/Mac¶
In order to transfer files from the cluster or to the cluster, you can use the shell command scp
from your computer.
The following command copies the directory my_dir on the home directory of the user username.
user@mycomputer:~$ ls my_dir/
file01.txt file02.txt
user@mycomputer:~$ scp -r my_dir username@dce.metz.centralesupelec.fr:~/
Password:
file01.txt 100% 132KB 132.3KB/s 00:00
file01.txt 100% 132KB 132.3KB/s 00:00
user@mycomputer:~$
The following command makes a local copy of the directory my_dir from the homedir of the user username.
user@mycomputer:~$ scp -r username@dce.metz.centralesupelec.fr:~/mydir .
Password:
user@mycomputer:~$
To transfer bigger files (when the transfer is long enough), you should use the shell command rsync
with the options --partial --progress
. The --partial
option allows rsync to keep partially transferred files and thus, in case an error occurs, to restart the tranfer from the partial files and not from scratch. The --progress
tells rsync to print information showing the progress of the transfer.
The following command copies the directory my_dir on the workdir of the user username.
user@mycomputer:~$ ls my_dir/
file01.txt file02.txt
user@mycomputer:~$ rsync --partial -r my_dir username@dce.metz.centralesupelec.fr:~
Password:
sending incremental file list
my_dir/
my_dir/file01.txt
10,737,418,240 100% 121.84MB/s 0:01:24 (xfr#1, to-chk=1/3)
my_dir/file02.txt
10,737,418,240 100% 111.19MB/s 0:01:32 (xfr#2, to-chk=0/3)
user@mycomputer:~$
Using a scp client on Windows (WinSCP)¶
To transfer data from a Windows machine, a SCP client is mandatory. For example, WinSCP or FileZilla.