A First Asset#

The cereal dataset#

Our assets will represent and transform a simple but scary CSV dataset, cereal.csv, which contains nutritional facts about 80 breakfast cereals.

Hello, asset!#

Let's write our first Dagster asset and save it as cereal.py.

A software-defined asset specifies an asset that you want to exist and how to compute its contents. Typically, you'll define assets by annotating ordinary Python functions with the @asset decorator.

Our first asset represents a dataset of cereal data, downloaded from the internet.

import csv
import requests
from dagster import asset


@asset
def cereals():
    response = requests.get("https://docs.dagster.io/assets/cereal.csv")
    lines = response.text.split("\n")
    cereal_rows = [row for row in csv.DictReader(lines)]

    return cereal_rows

In this simple case, our asset doesn't depend on any other assets.

Materializing our asset#

Materializing an asset means computing its contents and then writing them to persistent storage. By default, Dagster will pickle the value returned by the function and store it in the local filesystem, using the name of the asset as the name of the file. Where and how the contents are stored is fully customizable - e.g. you might store them in a database or a cloud object store like S3. We'll look at how that works later.

Assuming you’ve saved this code as cereal.py, you can execute it via two different mechanisms:

Dagit#

To visualize your assets in Dagit, run the following command. Make sure you're in the directory that contains the file with your code:

dagit -f cereal.py

You'll see output similar to:

Serving dagit on http://127.0.0.1:3000 in process 70635

You should be able to navigate to http://127.0.0.1:3000 in your web browser and view your asset.

Define an asset

Click the Materialize button to launch a run that materializes the asset. After the run completes, the Activity tab on the Assets page will contain information about the run.

Click the run ID in the Activity tab to view details about the run. A page including a structured stream of logs and events that occurred during the run will display:

Asset run

In this view, you can filter and search through the logs corresponding to the run that's materializing your asset.

Click the cereals link in the upper left corner of the page - next to the Success indicator in the image below - to navigate back to the Asset details page:

Asset details

Success!

Python API#

If you'd rather materialize your asset as a script, you can do that without spinning up Dagit. Just add a few lines to cereal.py. This executes a run within the Python process.

from dagster import materialize

if __name__ == "__main__":
    materialize([cereals])

Now you can just run:

python cereal.py