Managing your data in the cloud¶
After you’ve become familiar with downloading data from NEON Data API, or from other resources on the internet, into your cloud instances, you’re going to be in a situation where you need to move them and store them somewhere more permanently.
Its important to accept that many of these public data repositories are stable and that data will be available from them in the future.
This means that you should not create copies of original data unless you are in a situation where the data are very large and downloading them again is prohibitive of your time.
Setting up iCommands¶
First, we need to initiate a connection to the CyVerse iRODS.
1. In the Terminal type in
This should echo out a set of information in the terminal:One or more fields in your iRODS environment file (irods_environment.json) are missing; please enter them. Enter the host name (DNS) of the server to connect to:
2. Enter in the following data for each field:
Enter the host name (DNS) of the server to connect to: data.cyverse.org Enter the port number: 1247 Enter your irods user name: user_name Enter your irods zone: iplant Those values will be added to your environment file (for use by other iCommands) if the login succeeds. Enter your current iRODS password:
- host name (DNS):
- port number:
- irods user name:
<your CyVerse username>
- irods zone:
- current iRODS password:
<your current password>
3. You should now be authenticated to the Data Store.
To test, try typing
If you do not echo back anything, try Step 2. again
Uploading with iCommands¶
4. Type in
You should now see the contents of your personal Data Store
5. Upload a single file to the Data Store using
You need to select the file you want to copy, and the location in the Data Store you want to copy it to.iput -KPvf /home/rstudio/neon-shiny-browser/background.R /iplant/home/username/NEON_Downloads/
This command will take a single file
background.R and copy it from the container to the Data Store folder
f are described in the help file.
6. Upload a folder with recursive sub-folders and files
Next, we want to upload an entire directory with many folders and files in it.iput -KPbrvf /home/rstudio/NEON_Downloads/NEON_HARV_DP1.30003.001_2019 /iplant/home/<your-user-name>/NEON_Downloads/
I have added the flags
bfor bulk, and
rfor recursive to the
iputcommand. This will upload the entire directory
NEON_HARV_DP1.30003.001_2019to the data store.
P flag for Progressive and
v flag for verbose will echo out the progress of the upload until it completes.
When it is complete, the terminal should be available again.
To test whether your files are now in CyVerse try:ils /iplant/home/<your-user-name>/NEON_Downloads/ # and then ils /iplant/home/<your-user-name>/NEON_Downloads/NEON_HARV_DP1.30003.001_2019
You should be able to see the contents of your directory in the Data Store
8. These files are now in your private user space. No one can see them, but if you did want to share them, you can do so by modifying their permissions directly in the Discovery Environment, as shown in Step 1, or by using the following commands:
Follow the instructions in the help menu to set the user privileges and ownership.
This example makes your data directory public on the internet as a read-only archive:
ichmod read anonymous /iplant/home/<your-user-name>/NEON_Downloads/
Downloading with iCommands¶
It is also likely that you’re going to download data from the Data Store into your running Apps
9. Use the
ils command to look for some shared data in the Data Store
10. Download a file using
iget -KPvf /iplant/home/username/NEON_Downloads/benchmarking.rmd
This should download an Rmd file into your local instance (whatever current working directory you’re in in terminal)
11. Download a directory using
time iget -KPbvrf /iplant/home/username/NEON_Downloads/NEON_HARV_DP1.30003.001_2019/
Here we’re using the
timeflag to tell us how long the download takes
Downloading with WebDav¶
CyVerse Data Store also uses WebDav, an https based protocol for read-only data downloads from the Data Store.
We can use
curl commands in the terminal to download files this way.
12. Download a directory using
time wget -r -nH --cut-dirs=5 --no-parent -l8 --reject="index.html*" https://data.cyverse.org/dav-anon/iplant/home/username/NEON_Downloads/NEON_HARV_DP1.30003.001_2017/
again, we’re using the
timefunction to monitor the download speeds.
We’re also using some
wgetflags to just get the data and folders back from the Data Store.
Other Services: Downloading with S3¶
Many organizations are hosting data on Amazon Web Services S3, Google Cloud Storage, or Microsoft Azure.
Cloud buckets, like S3, use HTTPS protocols, just like WebDav.
OpenTopography.org (re)hosts some NEON lidar data, e.g. NEON D17 Pacific Southwest- California
We can download these using their Point Cloud Bulk Data Download option:
aws s3 cp s3://pc-bulk/NEON_D17/ . --recursive --endpoint-url https://opentopography.s3.sdsc.edu --no-sign-request
Fix or improve this documentation
- Search for an answer: CyVerse Learning Center
- Ask us for help: click on the lower right-hand side of the page
- Report an issue or submit a change: Github Repo Link
- Send feedback: learning@CyVerse.org