Data storage and file transfer

Data management on the DCE

Depending on the nodes (either GPU or CPU) you can access different mount points with read permissions.

Node pool Mount point Content Size
cam,tx,sh /mounts/Datasets1 Machine learning datasets 500 GB
/mounts/Datasets2 Machine learning datasets 800 GB
/mounts/Datasets3 Machine learning datasets 500 GB
/mounts/Datasets4 Machine learning datasets 4 TB

In addition to these mount points, for every job, you can access :

  • your home directory which is independent on the specific node you allocate and whose content will remain untouched after your allocation ends
  • a temporary drive that can be accessed through the TMPDIR environment variable which is temporary and completely removed from the node at the end of your allocation. TMPDIR is a space on a SSD local drive of the node
  • a temporary large space can be accessed through the TMPDIR_LFS environment variable. This is also a temporary space. This space does not exist on the cam nodes.

Your home directory is hosted on the network, it is permanent but its access is slow. The temporary drive is hosted on the compute node, it is temporary but with a fast access.

Data transfer

Shell commands scp and rsync on Linux/Mac

In order to transfer files from the cluster or to the cluster, you can use the shell command scp from your computer.

The following command copies the directory my_dir on the home directory of the user username.

user@mycomputer:~$ ls my_dir/
file01.txt  file02.txt
user@mycomputer:~$ scp -r my_dir username@dce.metz.centralesupelec.fr:~/
Password:
file01.txt                                                                       100%  132KB 132.3KB/s   00:00
file01.txt                                                                       100%  132KB 132.3KB/s   00:00
user@mycomputer:~$

The following command makes a local copy of the directory my_dir from the homedir of the user username.

user@mycomputer:~$ scp -r username@dce.metz.centralesupelec.fr:~/mydir .
Password:
user@mycomputer:~$

To transfer bigger files (when the transfer is long enough), you should use the shell command rsync with the options --partial --progress. The --partial option allows rsync to keep partially transferred files and thus, in case an error occurs, to restart the tranfer from the partial files and not from scratch. The --progress tells rsync to print information showing the progress of the transfer. The following command copies the directory my_dir on the workdir of the user username.

user@mycomputer:~$ ls my_dir/
file01.txt  file02.txt
user@mycomputer:~$ rsync --partial -r my_dir username@dce.metz.centralesupelec.fr:~
Password:
sending incremental file list
my_dir/
my_dir/file01.txt
 10,737,418,240 100%  121.84MB/s    0:01:24 (xfr#1, to-chk=1/3)
my_dir/file02.txt
 10,737,418,240 100%  111.19MB/s    0:01:32 (xfr#2, to-chk=0/3)
user@mycomputer:~$

Using a scp client on Windows (WinSCP)

To transfer data from a Windows machine, a SCP client is mandatory. For example, WinSCP or FileZilla.