Syntax, options, use cases and code examples for all six chapters. Colour-coded by chapter.
SAS code is plain ASCII text — write it in any editor and paste into the SAS Editor. Reference files: FIRST.SAS, SECOND.SAS
DATA MYDATA; INPUT ID NAME $ AGE; DATALINES; 1 Alice 30 2 Bob 25 ; RUN; PROC MEANS DATA=MYDATA; VAR AGE; RUN;
;RUN; statementAlways check the Log Window after running code. Errors appear in red. Never assume success just because output appeared.
/* Temporary */ DATA MYDATA; INPUT ID AGE; DATALINES; 1 25 2 30 ; RUN; /* Permanent — two-part name */ DATA PER.CLASS; SET MYDATA; RUN;
_/* $ marks character variables; values must be blank-separated */ DATA MYDATA; INPUT ID $ SBP DBP GENDER $ AGE WT; DATALINES; 1 120 80 F 34 65 2 130 85 M 45 80 ; RUN;
./* @@ keeps reading across lines — multiple subjects per line */ DATA MYDATA; INPUT ID AGE @@; DATALINES; 1 25 2 30 3 22 4 19 ; RUN;
/* Specify exact column ranges — blanks don't need to separate values */ DATA MYDATA; INPUT ID 1-2 NAME $ 3-12 SBP 13-15 DBP 16-18; DATALINES; 01Alice 12080 02Bob 13085 ; RUN;
/* @ pointer = column start; INFORMAT = how to read it */ DATA MYDATA; INPUT @1 SBP 3. /* 3-digit number at col 1 */ @4 DBP 3. /* 3-digit number at col 4 */ @7 DOB DATE9.; /* DDMONYYYY date at col 7 */ DATALINES; 12080 12JAN1990 ; RUN;
| Format | Raw Value | Read As |
|---|---|---|
| Comma7. | $40,000 | 40000 |
| Comma10.2 | 190,020.22 | 190020.22 |
| Dollar10.2 | $19020.22 | 19020.22 |
| Date9. | 12JAN1999 | SAS numeric date value |
Reference files: DFREEFORM.SAS, DCOLUMN.SAS, DINFORMAT.SAS
DATA MYDATA; INFILE "C:\Data\myfile.txt"; INPUT ID AGE SBP DBP; RUN;
DATA MYDATA; INFILE "C:\Data\myfile.csv" DLM="," DSD FIRSTOBS=2; INPUT ID NAME $ AGE; RUN;
Reference file: DINPUT.SAS
.SAS7BDAT file on disk/* Write using full path */ DATA "C:\SASDATA\PEOPLE"; INPUT ID NAME $ AGE; DATALINES; 1 Alice 30 ; RUN; /* Read back using the same path */ PROC PRINT DATA="C:\SASDATA\PEOPLE"; RUN;
/* Assign a library shortname to a folder */ LIBNAME SAS_LIB "C:\Users\You\Documents\Library"; /* Use two-part names to read/write */ DATA SAS_LIB.MYDATA; SET MYDATA; RUN;
Also create libraries via: Explorer → New Library → Browse → Save. Reference file: WRITE.SAS
.)PROC IMPORT OUT = SAS_LIB.EXAMPLE DATAFILE = "C:\Data\Example.xlsx" DBMS = XLSX REPLACE; SHEET = "Sheet1"; GETNAMES = YES; RUN; /* Verify with PROC CONTENTS */ PROC CONTENTS DATA=SAS_LIB.EXAMPLE; RUN;
PROC CONTENTS shows variable names, types and widths — always use after importing to verify.
DATA MYDATA2; SET MYDATA; LABEL SBP = "Systolic Blood Pressure" DBP = "Diastolic Blood Pressure" WT = "Weight in Kilograms" GENDER = "Biological Sex"; RUN; /* Display labels as column headers in PROC PRINT */ PROC PRINT DATA=MYDATA2 LABEL; RUN;
Single or double quotes both work. Labels up to 256 characters. Reference file: DLABEL.SAS
DATA MYDATA2; SET MYDATA; AREA = WIDTH * LENGTH; CELSIUS = (FAHRENHEIT - 32) * (5/9); TOTAL = SUM(SCORE1, SCORE2, SCORE3); AGEYEARS = INTCK('YEAR', DOB, TODAY()); RUN;
Reference file: DCALC.SAS
DATA MYDATA2; SET MYDATA; IF SBP < 120 THEN BP_CAT = "Normal"; ELSE IF SBP < 130 THEN BP_CAT = "Elevated"; ELSE IF SBP >= 140 THEN BP_CAT = "High"; RUN;
/* Numeric missing = . (period) */ IF AGE EQ -9 THEN AGE = .; /* Character missing = "" (two quote marks) */ IF GENDER NE "M" AND GENDER NE "F" THEN GENDER = " "; /* Subset — delete records from temp dataset only */ IF GENDER EQ "M" THEN DELETE;
DELETE only removes records from the temporary dataset — raw data is never affected. Reference files: DCONDITION.SAS, DMISSING.SAS
/* DROP — read all, discard listed variables */ DATA MYDATA_CLEAN; INPUT A B C D E F G; DROP E F; DATALINES; /* data here */ ; RUN; /* KEEP — only retain listed variables */ DATA MYDATA_SLIM; SET MYDATA; KEEP ID AGE GENDER SBP DBP; RUN;
Use KEEP when retaining few variables; use DROP when excluding just a few. Same outcome — pick whichever is shorter. Reference file: DKEEP.SAS
/* Ascending (default) */ PROC SORT DATA=MYDATA; BY AGE; RUN; /* Descending */ PROC SORT DATA=MYDATA; BY DESCENDING AGE; RUN; /* Multiple variables — sort to new dataset */ PROC SORT DATA=MYDATA OUT=SORTED; BY GENDER DESCENDING AGE; RUN;
Must PROC SORT before using a BY statement in PROC MEANS. Reference file: DSORT.SAS
DATA COMBINED; SET DATASET_A DATASET_B; RUN;
/* Step 1: sort both by key */ PROC SORT DATA=DEMOGRAPHICS; BY ID; RUN; PROC SORT DATA=LABRESULTS; BY ID; RUN; /* Step 2: merge */ DATA MERGED; MERGE DEMOGRAPHICS LABRESULTS; BY ID; RUN;
Reference files: DAPPEND1.SAS, DMERGE.SAS, DSUBSET2.SAS
PROC FORMAT DATA=MYDATA; VALUE FMTMARRIED 0 = "NO" 1 = "YES"; RUN; ODS HTML; PROC PRINT DATA=MYDATA; FORMAT MARRIED FMTMARRIED.; RUN; ODS HTML CLOSE;
PROC FORMAT DATA=MYDATA; VALUE $FMTGENDER "F" = "FEMALE" "M" = "MALE"; RUN; ODS HTML; PROC PRINT DATA=MYDATA; FORMAT GENDER $FMTGENDER.; RUN; ODS HTML CLOSE;
Numeric formats have no $; character formats always start with $. Always end the format call with a period: FMTMARRIED. Reference file: DFORMAT2.SAS
ODS HTML; PROC PRINT DATA=MYDATA DOUBLE LABEL N="Total observations:" OBS="Record #" ROUND; RUN; ODS HTML CLOSE;
Options go in the PROC statement, after the dataset name. Reference file: APRINT1.SAS
ODS HTML; PROC PRINT DATA=MYDATA; RUN; ODS HTML CLOSE; ODS PDF; PROC MEANS DATA=MYDATA; RUN; ODS PDF CLOSE;
TITLE "Employee Survey Results 2024"; TITLE2 "Subset: Department A"; FOOTNOTE "Source: HR Database"; PROC PRINT DATA=MYDATA; RUN; /* Clear after use */ TITLE; FOOTNOTE;
Supports up to TITLE10 / FOOTNOTE10. Declare without text to clear. Reference file: DTITLE.SAS
PROC MEANS DATA=MYDATA MAXDEC=2 N MEAN STD MIN MAX MEDIAN RANGE SUM; VAR variable1 variable2; BY GroupVar; /* must be pre-sorted */ /* OR */ CLASS GroupVar; /* no pre-sort needed */ RUN;
/* BY — must sort first */ PROC SORT DATA=MYDATA; BY GENDER; RUN; PROC MEANS DATA=MYDATA MEAN STD; VAR AGE; BY GENDER; RUN; /* CLASS — no sorting required */ PROC MEANS DATA=MYDATA MEAN STD; VAR AGE; CLASS GENDER; RUN;
Use CLASS for convenience; BY when data is already sorted. Reference files: AMEANS.1.SAS, AMEANS.2.SAS
PROC UNIVARIATE DATA=MYDATA NORMAL PLOT; VAR variable1; HISTOGRAM variable1 / NORMAL(COLOR=RED W=2); QQPLOT variable1; PROBPLOT variable1; INSET MEAN STD / FORMAT=(4.0); RUN;
Normality check: Skewness ≈ 0 + Kurtosis ≈ 0 + NORMAL test p-value > 0.05 → data is approximately normal. Reference files: AUNI1.SAS, AUNI2.SAS, AUNI4.SAS
PROC MEANS vs PROC UNIVARIATE: MEANS = quick summary table. UNIVARIATE = normality tests, distribution plots, outlier detection.
All commented text displays in green in the SAS editor. If it's not green, your comment syntax is wrong.