# Stata | Datasets & Variables

Stata is a software used for data analysis.

We take **datasets**, upload them into Stata, and use the program to organize, analyze, visualize, and report information about **variables**.

So what is a *dataset *and what is a *variable *you might ask?

Here is a quick overview of datasets and variables in Stata.

A **dataset** is a collection of related information about various elements (variables).

A** ****variable** is an element or feature that changes.

For example, we may have information about a group of students (a dataset). This information may include information about their grades, ages, and GPAs (variables).

We can use information about the set of variables to understand how they have changed in the past, and how they may change in the future, using probability and forecasting.

Variables in Stata can either be *numeric *or* string*.

**Numeric variables **contain only numbers and can be used for calculations. Numeric variables in Stata are blue.

**String variables** may contain numbers, letters, and other characters. Calculations cannot be performed on string variables (even if they contain only numbers) as they are treated as text. String variables in Stata are red.

Now that we understand the basics of datasets and variables, let's start learning about the Stata interface!