// SAS Programming · Complete Study Notes

SAS Programming
Chapters 1 – 6

Syntax, options, use cases and code examples for all six chapters. Colour-coded by chapter.

Chapter 1

Getting Started with SAS

🖥
The SAS Environment
Windows, panels and editor colour coding
Interface
Key Windows
  • Editor — write SAS code (stored as ASCII)
  • Log — run details, errors in red, warnings in orange
  • Output — analysis results after a run
  • Explorer/Results — libraries, datasets, tree view
  • Graph — appears when graphs are produced
Editor Colour Coding
  • Green — comments
  • Red — errors
  • Dark Blue — SAS keywords
  • Yellow — data values (classic SAS)
  • Boundary line — separates steps
💡

SAS code is plain ASCII text — write it in any editor and paste into the SAS Editor. Reference files: FIRST.SAS, SECOND.SAS

⚙️
How SAS Works — Three Steps
DATA → PROC → Output
Concept
1. DATA Step
  • Enter or manipulate data
  • Creates a SAS dataset
  • Runs row by row sequentially
2. PROC Step
  • Calls built-in procedures
  • MEANS, PRINT, SORT, IMPORT…
3. Output Step
  • Observes and directs output
  • Controlled by ODS statements
Minimal working program
DATA MYDATA;
  INPUT ID NAME $ AGE;
  DATALINES;
  1 Alice 30
  2 Bob   25
  ;
RUN;

PROC MEANS DATA=MYDATA;
  VAR AGE;
RUN;
🔑
Tips, Tricks & Common Errors
Rules for writing valid SAS code
Syntax
Syntax Rules
  • One statement can span multiple lines
  • Several statements can share one line
  • SAS is not case-sensitive
  • Every statement ends with a semicolon ;
Top 3 Errors
  • #1 — Missing or misplaced semicolon ;
  • #2 — Missing RUN; statement
  • #3 — Unmatched quotation marks
🔴

Always check the Log Window after running code. Errors appear in red. Never assume success just because output appeared.

Chapter 2

Getting Data into SAS

📦
The DATA Statement & Dataset Names
Temporary vs permanent datasets
Data Step
Temporary (one-part name)
  • Exists only during the current session
  • Auto-stored in the WORK library
  • Example: DATA MYDATA;
Permanent (two-part name)
  • Saved to disk beyond the session
  • Format: LIBRARY.DATASET
  • Example: DATA PER.CLASS;
Temporary vs Permanent
/* Temporary */
DATA MYDATA;
  INPUT ID AGE;
  DATALINES;
  1 25
  2 30
  ;
RUN;

/* Permanent — two-part name */
DATA PER.CLASS;
  SET MYDATA;
RUN;
📋
Variable Names & Types
Naming rules and the three SAS variable types
Fundamentals
Naming Rules
  • 1–32 characters, no blanks
  • Must start with A–Z or underscore _
  • Can include numbers (not first character)
  • Case-insensitive
  • Valid: GENDER, AGE_IN_1999, _OUTCOME
  • Invalid: AGE IN 2000, 2000Count, AGE-In-2000
Three Variable Types
  • Numeric — arithmetic calculations, grouping codes
  • Character ($) — text/string, not used in arithmetic
  • Date — stored as a SAS numeric value; many formats accepted (10/15/09, JAN052010, etc.)
📥
Four Methods of Reading Data
Freeform, Compact, Column, and Formatted input
Input
Method 1 — Freeform List Input
/* $ marks character variables; values must be blank-separated */
DATA MYDATA;
  INPUT ID $ SBP DBP GENDER $ AGE WT;
  DATALINES;
  1 120 80 F 34 65
  2 130 85 M 45 80
  ;
RUN;
✅ Advantages
  • Minimal specification
  • No fixed column positions required
  • Best for blank-separated data
❌ Disadvantages
  • Variables must be in INPUT order
  • Values need ≥ 1 blank separator
  • Missing values must use .
Method 2 — Compact Method (@@)
/* @@ keeps reading across lines — multiple subjects per line */
DATA MYDATA;
  INPUT ID AGE @@;
  DATALINES;
  1 25 2 30 3 22 4 19
  ;
RUN;
Method 3 — Column Input
/* Specify exact column ranges — blanks don't need to separate values */
DATA MYDATA;
  INPUT ID 1-2  NAME $ 3-12  SBP 13-15  DBP 16-18;
  DATALINES;
  01Alice     12080
  02Bob       13085
  ;
RUN;
✅ Advantages
  • No blanks needed between fields
  • Embedded spaces in names work (John Smith)
  • Character fields up to 200 chars
❌ Disadvantages
  • Data must be in fixed column positions
  • Blank fields read as missing
  • More specs than list input
Method 4 — Formatted Input (@ pointer)
/* @ pointer = column start; INFORMAT = how to read it */
DATA MYDATA;
  INPUT @1 SBP  3.      /* 3-digit number at col 1 */
        @4 DBP  3.      /* 3-digit number at col 4 */
        @7 DOB  DATE9.; /* DDMONYYYY date at col 7 */
  DATALINES;
  12080 12JAN1990
  ;
RUN;
Common SAS INFORMATs
FormatRaw ValueRead As
Comma7.$40,00040000
Comma10.2190,020.22190020.22
Dollar10.2$19020.2219020.22
Date9.12JAN1999SAS numeric date value
💡

Reference files: DFREEFORM.SAS, DCOLUMN.SAS, DINFORMAT.SAS

📂
INFILE Statement & Options
Read external ASCII/text files instead of inline DATALINES
Input
Basic INFILE (must appear before INPUT)
DATA MYDATA;
  INFILE "C:\Data\myfile.txt";
  INPUT ID AGE SBP DBP;
RUN;
DLM = ","
Custom delimiter (e.g. comma for CSV)
DSD
Two consecutive delimiters = missing value
MISSOVER
Moves to next line when obs values run short
FIRSTOBS = n
Start reading at row n (skip headers)
OBS = n
Stop reading at row n
CSV file with options
DATA MYDATA;
  INFILE "C:\Data\myfile.csv"
         DLM="," DSD FIRSTOBS=2;
  INPUT ID NAME $ AGE;
RUN;
💡

Reference file: DINPUT.SAS

Chapter 3

Reading, Writing & Importing Data

🗄️
SAS Libraries & Permanent Datasets
Storing data beyond the current session
Libraries
Temporary Dataset
  • Single-level name: DATA PEOPLE;
  • Lost when session ends
  • Stored in WORK library
Permanent Dataset
  • Two-part name: MYSASLIB.PEOPLE
  • Saved as .SAS7BDAT file on disk
  • Writable via full Windows path
Windows path technique
/* Write using full path */
DATA "C:\SASDATA\PEOPLE";
  INPUT ID NAME $ AGE;
  DATALINES;
  1 Alice 30
  ;
RUN;

/* Read back using the same path */
PROC PRINT DATA="C:\SASDATA\PEOPLE"; RUN;
LIBNAME statement
/* Assign a library shortname to a folder */
LIBNAME SAS_LIB "C:\Users\You\Documents\Library";

/* Use two-part names to read/write */
DATA SAS_LIB.MYDATA; SET MYDATA; RUN;
💡

Also create libraries via: Explorer → New Library → Browse → Save. Reference file: WRITE.SAS

📊
PROC IMPORT & Importing Excel/CSV
Read external files (XLS, XLSX, CSV) into SAS datasets
Import
Excel prep checklist
  • Row 1 = valid SAS variable names
  • Each column = one consistent variable type
  • Blank cells → missing value (.)
Key options
  • OUT = — output SAS dataset
  • DATAFILE = — path to input file
  • DBMS = — file type (XLSX, CSV, XLS)
  • REPLACE — overwrite existing dataset
  • SHEET = — which worksheet
  • GETNAMES = — YES if row 1 has names
PROC IMPORT example
PROC IMPORT
  OUT      = SAS_LIB.EXAMPLE
  DATAFILE = "C:\Data\Example.xlsx"
  DBMS     = XLSX
  REPLACE;
  SHEET    = "Sheet1";
  GETNAMES = YES;
RUN;

/* Verify with PROC CONTENTS */
PROC CONTENTS DATA=SAS_LIB.EXAMPLE; RUN;
💡

PROC CONTENTS shows variable names, types and widths — always use after importing to verify.

Chapter 4

Preparing Data for Analysis

🏷️
LABEL Statement
Give descriptive names to cryptic variable names
Data Prep
Syntax & example
DATA MYDATA2;
  SET MYDATA;
  LABEL
    SBP    = "Systolic Blood Pressure"
    DBP    = "Diastolic Blood Pressure"
    WT     = "Weight in Kilograms"
    GENDER = "Biological Sex";
RUN;

/* Display labels as column headers in PROC PRINT */
PROC PRINT DATA=MYDATA2 LABEL; RUN;
💡

Single or double quotes both work. Labels up to 256 characters. Reference file: DLABEL.SAS

🔢
Creating New Variables & SAS Functions
Arithmetic operators and built-in functions
Data Prep
Arithmetic Operators
  • + Addition   - Subtraction
  • * Multiplication   / Division
  • ** Exponentiation
  • Order: Parentheses → Exponents → × ÷ → + −
Important Functions
  • INT(x) — integer part
  • MAX(x1,x2,…) — maximum
  • MIN(x1,x2,…) — minimum
  • SUM(x1,x2,…) — sum
  • ROUND(x,unit) — round
  • MDY(m,d,y) — create SAS date
  • INTCK('interval',s,e) — date diff
Creating variables
DATA MYDATA2;
  SET MYDATA;
  AREA      = WIDTH * LENGTH;
  CELSIUS   = (FAHRENHEIT - 32) * (5/9);
  TOTAL     = SUM(SCORE1, SCORE2, SCORE3);
  AGEYEARS  = INTCK('YEAR', DOB, TODAY());
RUN;
💡

Reference file: DCALC.SAS

🔀
IF-THEN-ELSE & Missing Values
Conditional logic and handling invalid data
Data Prep
IF-THEN-ELSE
DATA MYDATA2;
  SET MYDATA;

  IF      SBP <  120 THEN BP_CAT = "Normal";
  ELSE IF SBP <  130 THEN BP_CAT = "Elevated";
  ELSE IF SBP >= 140 THEN BP_CAT = "High";
RUN;
Assigning & flagging missing values
/* Numeric missing = .  (period) */
IF AGE EQ -9 THEN AGE = .;

/* Character missing = "" (two quote marks) */
IF GENDER NE "M" AND GENDER NE "F"
  THEN GENDER = " ";

/* Subset — delete records from temp dataset only */
IF GENDER EQ "M" THEN DELETE;
📌

DELETE only removes records from the temporary dataset — raw data is never affected. Reference files: DCONDITION.SAS, DMISSING.SAS

✂️
DROP & KEEP Statements
Select which variables to retain in a dataset
Data Prep
DROP and KEEP
/* DROP — read all, discard listed variables */
DATA MYDATA_CLEAN;
  INPUT A B C D E F G;
  DROP E F;
  DATALINES;
  /* data here */
  ;
RUN;

/* KEEP — only retain listed variables */
DATA MYDATA_SLIM;
  SET MYDATA;
  KEEP ID AGE GENDER SBP DBP;
RUN;
💡

Use KEEP when retaining few variables; use DROP when excluding just a few. Same outcome — pick whichever is shorter. Reference file: DKEEP.SAS

🔃
PROC SORT
Sort observations ascending or descending
Sorting
DATA =
Input dataset
OUT =
Output dataset (omit to sort in place)
BY var(s)
One or more sort keys (ascending default)
DESCENDING
Place before a variable for descending order
Sorting examples
/* Ascending (default) */
PROC SORT DATA=MYDATA; BY AGE; RUN;

/* Descending */
PROC SORT DATA=MYDATA; BY DESCENDING AGE; RUN;

/* Multiple variables — sort to new dataset */
PROC SORT DATA=MYDATA OUT=SORTED;
  BY GENDER DESCENDING AGE;
RUN;
📌

Must PROC SORT before using a BY statement in PROC MEANS. Reference file: DSORT.SAS

🔗
Appending & Merging Datasets
Stack rows (SET) or join by a common key (MERGE + BY)
Combining
Append (SET)
  • Stacks rows from multiple datasets
  • Compatible structures required
Merge (MERGE + BY)
  • Joins on a common key variable
  • Both datasets must be sorted by the BY variable first
Append with SET
DATA COMBINED;
  SET DATASET_A DATASET_B;
RUN;
Merge with MERGE + BY
/* Step 1: sort both by key */
PROC SORT DATA=DEMOGRAPHICS; BY ID; RUN;
PROC SORT DATA=LABRESULTS;   BY ID; RUN;

/* Step 2: merge */
DATA MERGED;
  MERGE DEMOGRAPHICS LABRESULTS;
  BY ID;
RUN;
💡

Reference files: DAPPEND1.SAS, DMERGE.SAS, DSUBSET2.SAS

Chapter 5

SAS Procedures

🏷
PROC FORMAT
Relabel coded values for display without altering raw data
Formatting
Numeric format
PROC FORMAT DATA=MYDATA;
  VALUE FMTMARRIED  0 = "NO"
                     1 = "YES";
RUN;

ODS HTML;
PROC PRINT DATA=MYDATA;
  FORMAT MARRIED FMTMARRIED.;
RUN;
ODS HTML CLOSE;
Character format ($)
PROC FORMAT DATA=MYDATA;
  VALUE $FMTGENDER  "F" = "FEMALE"
                     "M" = "MALE";
RUN;
ODS HTML;
PROC PRINT DATA=MYDATA;
  FORMAT GENDER $FMTGENDER.;
RUN;
ODS HTML CLOSE;
💡

Numeric formats have no $; character formats always start with $. Always end the format call with a period: FMTMARRIED. Reference file: DFORMAT2.SAS

🖨
PROC PRINT (Modified)
Control the appearance of printed output
Output
DOUBLE
Double-spaces output rows
LABEL
Shows variable labels as column headers
N = "text"
Heading + total observation count
OBS = "text"
Custom label for observation column
ROUND
Rounds numeric values before printing
Example
ODS HTML;
PROC PRINT DATA=MYDATA
           DOUBLE LABEL
           N="Total observations:"
           OBS="Record #"
           ROUND;
RUN;
ODS HTML CLOSE;
📌

Options go in the PROC statement, after the dataset name. Reference file: APRINT1.SAS

📤
ODS — Output Delivery System
Export output to HTML, PDF, RTF and more
Output
How it works
  • Wrap code between ODS …; and ODS … CLOSE;
  • HTML is the default from SAS 9.4 onwards
Destinations
  • HTML — browser-viewable (default)
  • PDF — printable document
  • RTF — Word-compatible
  • LISTING — classic plain text
ODS template
ODS HTML;
PROC PRINT DATA=MYDATA; RUN;
ODS HTML CLOSE;

ODS PDF;
PROC MEANS DATA=MYDATA; RUN;
ODS PDF CLOSE;
📝
TITLE & FOOTNOTE Statements
Add text to the top or bottom of every output page
Output
Usage
TITLE    "Employee Survey Results 2024";
TITLE2   "Subset: Department A";
FOOTNOTE "Source: HR Database";

PROC PRINT DATA=MYDATA; RUN;

/* Clear after use */
TITLE; FOOTNOTE;
💡

Supports up to TITLE10 / FOOTNOTE10. Declare without text to clear. Reference file: DTITLE.SAS

💬
Comments in SAS Code
Annotate your programs — always appears green in editor
Syntax
Both comment styles
/* Block comment — can span multiple lines */

* Statement comment — ends with semicolon;

DATA MYDATA;
  /* Inline: explaining this step */
  SET RAWDATA;
RUN;
🟢

All commented text displays in green in the SAS editor. If it's not green, your comment syntax is wrong.

Chapter 6

Evaluating Quantitative Data

📊
PROC MEANS
Descriptive statistics for quantitative variables
Statistics
What it does
  • Evaluates quantitative/numerical data
  • Groups analysis via BY or CLASS
  • Helps spot outliers quickly
Default statistics
  • N — observations count
  • MEAN — arithmetic average
  • STD — standard deviation
  • MIN / MAX — extremes
Full syntax
PROC MEANS DATA=MYDATA MAXDEC=2
           N MEAN STD MIN MAX MEDIAN RANGE SUM;
  VAR   variable1 variable2;
  BY    GroupVar;     /* must be pre-sorted */
  /* OR */
  CLASS GroupVar;     /* no pre-sort needed */
RUN;
MAXDEC = n
Decimal places in output
N / NMISS
Non-missing / missing obs counts
MEAN / STD
Average / standard deviation
MIN / MAX
Minimum and maximum values
RANGE / SUM
MAX−MIN / total sum
VAR / STDERR
Variance / standard error
MEDIAN
50th percentile
CLM
Confidence limits for mean
BY vs CLASS
/* BY — must sort first */
PROC SORT DATA=MYDATA; BY GENDER; RUN;
PROC MEANS DATA=MYDATA MEAN STD;
  VAR AGE; BY GENDER;
RUN;

/* CLASS — no sorting required */
PROC MEANS DATA=MYDATA MEAN STD;
  VAR AGE; CLASS GENDER;
RUN;
💡

Use CLASS for convenience; BY when data is already sorted. Reference files: AMEANS.1.SAS, AMEANS.2.SAS

📈
PROC UNIVARIATE
Normality testing, outlier detection and distribution plots
Distribution
What it does
  • Tests whether data is normally distributed
  • Identifies outliers
  • Produces histograms, QQ plots, probability plots
Key output metrics
  • Skewness — symmetry, should be ≈ 0
  • Kurtosis — shape, should equal 0
  • Positive kurtosis → more peaked
  • Negative kurtosis → flatter
Full syntax
PROC UNIVARIATE DATA=MYDATA NORMAL PLOT;
  VAR       variable1;
  HISTOGRAM variable1 / NORMAL(COLOR=RED W=2);
  QQPLOT    variable1;
  PROBPLOT  variable1;
  INSET     MEAN STD / FORMAT=(4.0);
RUN;
NORMAL
Normality test table (Shapiro-Wilk etc.)
PLOT
Stem-and-leaf, box & probability plots
TRIMMED
Trim extremes before mean calculation
CIBASIC
Confidence intervals for mean, std, var
ALPHA =
Alpha level for intervals
MU0 =
Parameter for t-test of the mean
HISTOGRAM / NORMAL
Bell curve overlay on histogram
QQPLOT
High-quality normal probability plot
INSET
Stats displayed inside the graph
💡

Normality check: Skewness ≈ 0 + Kurtosis ≈ 0 + NORMAL test p-value > 0.05 → data is approximately normal. Reference files: AUNI1.SAS, AUNI2.SAS, AUNI4.SAS

📌

PROC MEANS vs PROC UNIVARIATE: MEANS = quick summary table. UNIVARIATE = normality tests, distribution plots, outlier detection.