Stata
Stata is a commercial statistical package, which provides a complete solution for data analysis, data management, and graphics.
The University of Luxembourg contributes to a campus-wide license -- see SIU / Service Now Knowledge Base ticket on Stata MP2
Available versions of Stata on ULHPC platforms¶
To check available versions of Stata at ULHPC, type module spider stata
.
math/Stata/<version>
Once loaded, the modules brings to you the following binaries:
Binary | Description |
---|---|
stata |
Non-graphical standard Stata/IC. For better performance and support for larger databases, stata-se should be used. |
stata-se |
Non-graphical Stata/SE designed for large databases. Can be used to run tasks automatically with the batch flag -b and a Stata '*.do file |
xstata |
Graphical standard Stata/IC. For better performance and support for larger databases, xstata-se should be used. |
xstata-se |
Graphical Stata/SE designed for large databases. Can be used interactively in a similar working environment to Windows and Mac versions. |
Interactive Mode¶
To open a Stata session in interactive mode, please follow the following steps:
(eventually) connect to the ULHPC login node with the -X
(or -Y
) option:
ssh -X iris-cluster # OR on Mac OS: ssh -Y iris-cluster
ssh -X aion-cluster # OR on Mac OS: ssh -Y aion-cluster
Then you can reserve an interactive job, for instance with 2 cores. Don't forget to use the --x11
option if you intend to use the GUI.
$ si --x11 -c2 # You CANNOT use more than 2 cores
# Load the module Stata and needed environment
(node)$ module purge
(node)$ module load math/Stata
# Non-Graphical version (CLI)
(node)$ stata
___ ____ ____ ____ ____ ®
/__ / ____/ / ____/ 17.0
___/ / /___/ / /___/ BE—Basic Edition
Statistics and Data Science Copyright 1985-2021 StataCorp LLC
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC https://www.stata.com
979-696-4600 stata@stata.com
Stata license: Unlimited-user network, expiring 31 Dec 2022
Serial number: <serial>
Licensed to: University of Luxembourg
Campus License - see KB0010885 (Service Now)
.
# To quit Stata
. exit, clear
# To run the GUI version, over X11
(node)$ stata &
Location of your ado files¶
Run the sysdir
command to see the search path for ado files:
. sysdir
STATA: /opt/apps/resif/<cluster>/<version>/<arch>/software/Stata/<stataversion>/
BASE: /opt/apps/resif/<cluster>/<version>/<arch>/software/Stata/<stataversion>/ado/base/
SITE: /opt/apps/resif/<cluster>/<version>/<arch>/software/Stata/<stataversion>/software/Stata/ado/
PLUS: ~/ado/plus/
PERSONAL: ~/ado/personal/
You should thus store ado files in `$HOME/ado/personal. For more see this document.
Batch mode¶
To run Stata in batch mode, you need to create do-files which contain the series of commands you would like to run.
With a do file (filename.do
) in hand, you can run it from the shell in the command line with:
stata -b do filename.do
With the -b
flag, outputs will be automatically saved to the outputfile filename.log
.
#!/bin/bash -l
#SBATCH -J Stata
###SBATCH -A <project_name>
#SBATCH --ntasks-per-node 1
#SBATCH -c 1
#SBATCH --time=00:30:00
#SBATCH -p batch
# Load the module Stata
module purge
module load math/Stata
srun stata -b do INPUTFILE.do
#!/bin/bash -l
#SBATCH -J Stata
###SBATCH -A <project_name>
#SBATCH --ntasks-per-node 1
#SBATCH -c 2
#SBATCH --time=00:30:00
#SBATCH -p batch
# Load the module Stata
module purge
module load math/Stata
# Use stata-mp to run across multiple cores
srun -c $SLURM_CPUS_PER_TASK stata-mp -b do INPUTFILE.do
Running Stata in Parallel¶
Stata/MP¶
You can use Stata/MP to advantage of the advanced multiprocessing capabilities of Stata/MP. Stata/MP provides the most extensive multicore support of any statistics and data management package.
Note however that the current license limits the maximum number of cores (to 2 !). Example of interactive usage:
$ si --x11 -c2 # You CANNOT use more than 2 cores
# Load the module Stata and needed environment
(node)$ module purge
(node)$ module load math/Stata
# Non-Graphical version (CLI)
(node)$ stata-mp
___ ____ ____ ____ ____ ®
/__ / ____/ / ____/ 17.0
___/ / /___/ / /___/ MP—Parallel Edition
Statistics and Data Science Copyright 1985-2021 StataCorp LLC
StataCorp
4905 Lakeway Drive
College Station, Texas 77845 USA
800-STATA-PC https://www.stata.com
979-696-4600 stata@stata.com
Stata license: Unlimited-user 2-core network, expiring 31 Dec 2022
Serial number: <serial>
Licensed to: University of Luxembourg
Campus License - see KB0010885 (Service Now)
. set processors 2 # or use env SLURM_CPU_PER_TASKS
. [...]
. exit, clear
Note that using the stata-mp
executable, Stata will automatically use the requested number of cores from Slurm's --cpus-per-task
option.
This implicit parallelism does not require any changes to your code.
User-packages parallel and gtools¶
User-developed Stata packages can be installed from a login node using one of the Stata commands
net install <package>
These packages will be installed in your home directory by default.
Among others, the parallel
package implements parallel for loops.
Also, the gtools
provides faster alternatives to some Stata commands when working with big data.
(node)$ stata
# installation
. net install parallel, from(https://raw.github.com/gvegayon/parallel/stable/) replace
checking parallel consistency and verifying not already installed...
installing into /home/users/svarrette/ado/plus/...
installation complete.
# update index of the installed packages
. mata mata mlib index
.mlib libraries to be searched are now
lmatabase;lmatasvy;lmatabma;lmatapath;lmatatab;lmatanumlib;lmatacollect;lmatafc;lmatapss;lmat
> asem;lmatamixlog;lmatamcmc;lmatasp;lmatameta;lmataopt;lmataado;lmatagsem;lmatami;lmatapostest;l
> matalasso;lmataerm;lparallel
# initial - ADAPT with SLURM_CPU_PER_TASKS
. parallel initialize 4, f # Or (better) find a way to use env SLURM_CPU_PER_TASKS
N Child processes: 4
Stata dir: /mnt/irisgpfs/apps/resif/iris/2020b/broadwell/software/Stata/17/stata
. sysuse auto
(1978 automobile data)
. parallel, by(foreign): egen maxp = max(price)
Small workload/num groups. Temporarily setting number of child processes to 2
--------------------------------------------------------------------------------
Parallel Computing with Stata
Child processes: 2
pll_id : bcrpvqtoi1
Running at : /mnt/irisgpfs/users/svarrette
Randtype : datetime
Waiting for the child processes to finish...
child process 0002 has exited without error...
child process 0001 has exited without error...
--------------------------------------------------------------------------------
Enter -parallel printlog #- to checkout logfiles.
--------------------------------------------------------------------------------
. tab maxp
maxp | Freq. Percent Cum.
------------+-----------------------------------
12990 | 22 29.73 29.73
15906 | 52 70.27 100.00
------------+-----------------------------------
Total | 74 100.00
. exit, clear