Data Set Used in Examples

Each example illustrated in this paper uses a Movies data set (table) consisting of six columns: title, length, category, year, studio, and rating. Title, category, studio, and rating are defined as character columns, with length and year being defined as numeric columns, as shown below.

Producing a Vertical Bar Chart with the SGPLOT Procedure

Whether the data you work with is large or small, or somewhere in between, using color with the SGPLOT procedure can help to organize, engage, promote, and encourage greater comprehension with an audience. Color allows your audience to see hidden, or hard to see, things in your data. PROC SGPLOT creates single-cell bar charts, box plots, bubble plots, dot plots, histograms, line plots, scatter plots, and an assortment of other plot types quickly and easily. The next example illustrates the SGPLOT procedure’s VBAR statement to specify the Rating variable, the GROUP=Rating and GROUPDISPLAY=Cluster parameters to apply pre-set (or “default”) colors in the vertical bar (VBAR) chart results.

TITLE 'VBAR Chart withSGPLOT';

PROC SGPLOT DATA=MOVIES;

 VBAR Rating / GROUP=Rating

               DATALABEL

               FILL

               GROUPDISPLAY=Cluster;

RUN;

To enhance or “liven up” the previous results, user-selected colors can be applied to the bars themselves. Options are available to change the “default” colors and color schemes to address specific needs and requirements. The next example illustrates how the preceding results can be modified to produce a vertical bar (VBAR) chart with “custom” colors. 

The first step illustrates the creation of an attribute map (ATTRMAP) SAS data set (or control data set) that contains information about the values and colors to use in the creation of the bar chart. In lieu of the colors produced in the previous results, you’ll have the ability to override these “default” colors with colors of your own choosing. You’ll notice that three variables are defined in the ATTRMAP dataset: ID, Value and Fill Color. The ID variable contains the value associated with the name of the RATING variable; the Value variable contains the movie rating value (e.g. G, PG, PG-13 and R); and the Fill Color value contains the color to use for each vertical bar. To produce the vertical bar chart with the SGPLOT procedure, the DATA= parameter specifies the name of the MOVIES dataset; the DATTRMAP= procedure option references the name of the attribute map data set; and the VBAR statement specifies the Rating variable, the GROUPDISPLAY=Cluster and ATTRID=Rating variable.

DATA ATTRMAP;

 INPUT @1 ID $6.

       @8 Value $5.

      @14 FillColor $6.;

 DATALINES;

Rating G      Green

Rating PG    Blue

Rating PG-13 Yellow

Rating R     Red;

RUN;

 

TITLE 'VBAR Chart with SGPLOT';

PROC SGPLOT DATA=MOVIES

       DATTRMAP=ATTRMAP;

 VBAR Rating /  GROUP=Rating

               DATALABEL

               FILL

               GROUPDISPLAY=Cluster

               ATTRID=Rating;

RUN

Producing a Series Plot with the SGPLOT Procedure

Colors and various color schemes can be applied to chart and plot elements. As was shown in the previous example, pre-set or fixed colors are available “right-out-of-the-box” with the SGPLOT procedure. Options are available to change the “default” colors and color schemes to address specific needs and requirements. In the next example, the SGPLOT procedure produces a series (SERIES) and scatter (SCATTER) plot that uses the pre-set or “default” colors.

PROC SORTDATA=mydata.Movies

          OUT=work.Movies_Sorted;

 BY Year;

RUN;

 

TITLE 'Response over Yearby Rating';

PROC SGPLOTDATA=work.Movies_Sorted;

 SERIES X=Year

        Y=Length /

             GROUP=Rating

             LINEATTRS=(THICKNESS=3);

 SCATTER X=Year

         Y=Length /

              GROUP=Rating

              MARKERATTRS=

       (SYMBOL=CIRCLEFILLED SIZE=11);

 XAXIS DISPLAY=(NOLABEL);

 YAXIS LABEL='Response';

RUN;

In the next example, the SGPLOT procedure is specified along with a user-defined attribute map (ATTRMAP) to produce a series (SERIES) and scatter (SCATTER) plot with “custom” colors. By defining an attribute map (ATTRMAP) SAS data set with information about the values and colors to use during the creation of the series plot, users have the ability to override and control which colors, and color schemes, they desire. In producing the series plot with the SGPLOT procedure, the DATTRMAP= procedure option references the attribute map data set, the GROUPDISPLAY=Cluster, the ATTRID=Rating, and DATALABEL=Rating SERIES statement options are specified.

DATA ATTRMAP;

 INPUT @1 ID $6.

       @8 Value $5.

      @14 LineColor $6.

      @21 MarkerColor $6.;

 DATALINES;

Rating G     Green

Rating PG    Blue

Rating PG-13 Yellow

Rating R     Red;

RUN;

 

PROC SORT DATA=Movies

          OUT=work.Movies_Sorted;

 BY Year;

RUN;

 

TITLE 'Response over Yearby Rating';

PROC SGPLOT DATA=work.Movies_Sorted

       DATTRMAP=ATTRMAP;

 SERIES X=Year

        Y=Length /

             GROUP=Rating

             LINEATTRS=(THICKNESS=3)

             GROUPDISPLAY=Cluster

             ATTRID=Rating

             DATALABEL=Rating;

 SCATTER X=Year

         Y=Length /

             GROUP=Rating

             MARKERATTRS=

       (SYMBOL=CIRCLEFILLEDSIZE=11)

             ATTRID=Rating;

 XAXIS DISPLAY=(NOLABEL);

 YAXIS LABEL='Response';

RUN;

Producing “Customized” Excel Results

The use of background and foreground colors when producing Excel spreadsheets should be chosen carefully and contrast sufficiently with the visual being used to prevent contrast issues from appearing, as displayed in the Excel spreadsheet below. Color contrast issues should be avoided whenever possible to prevent the results from becoming unreadable to a mass audience. As can be seen, the blue background and black foreground (text) colors make the spreadsheet’s content difficult to read, particularly since insufficient contrast is used in the display of the black text and blue background. Also, the black text with the blue background color in the “Movie Rating” column is difficult to read due to lack of contrast.

To address the contrast issues illustrated in the previous results, the next example shows how a PROC FORMAT and ODS Excel code were used in the production of the Excel spreadsheet to address the color contrast issues described previously. The code defines values for each movie rating and its associated color in a user-defined format by applying the traffic-lighting scenarios (e.g. “G-rated” movies are shaded in ‘Green’, “PG-rated” movies are shaded in ‘Light Blue’, “PG-13-rated” movies are shaded in ‘Orange’, and “R-rated” movies are shaded in ‘Red’) in a user-defined format. Then, an ODS Excel statement is specified to send the results to an Excel spreadsheet. Finally, a PROC REPORT is specified to produce the desired layout for the content of the Excel spreadsheet along with the traffic-lighting scenario for each of the movie ratings.

PROC FORMAT;

 Value $RatingFmt

         'G'     = 'Green'

         'PG'    = 'Light Blue'

         'PG-13' = 'Orange'

         'R'     = 'Red';

RUN;

 

ODS Excel file='c:/MovieRating with Traffic Lighting.xlsx'

        style=styles.HTMLBlue;

 

PROC REPORTDATA=WORK.Movies_Sorted NOWINDOWS

           STYLE(Header)={BackGround=WhiteForeGround=Black Font=(Calibri,10pt,Bold)};

 COLUMNS Title Length Category Year StudioRating;

 DEFINE Title    / DISPLAY   'Movie Title'        WIDTH=25;

 DEFINE Length   / DISPLAY   'Movie Length'       WIDTH=12CENTER;

 DEFINE Category / DISPLAY    'Movie Category'     WIDTH=20;

 DEFINE Year     / DISPLAY    'Year of Movie'      WIDTH=13 CENTER;

 DEFINE Studio   / DISPLAY   'Studio'             WIDTH=25;

 DEFINE Rating   / DISPLAY   'Movie Rating'       WIDTH=12CENTER

        STYLE(Column)=[FontWeight=boldBackGround=$RatingFmt.];

RUN;

 

ODS Excel close;

The resulting Excel spreadsheet below illustrates the “default” style, along with the background and foreground color customizations that were used to resolve the color contrast issues described earlier.

Trademark Citations

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

–––––––––

About the Author

Kirk Paul Lafler is an entrepreneur and founder of Software Intelligence Corporation, and has worked with SAS software since 1979. As a SAS consultant, application developer, programmer, data analyst, educator and author at Software Intelligence Corporation, and an advisor and SAS programming adjunct professor at the University of California San Diego Extension, Kirk has taught SAS courses, seminars, webinars and hands-on workshops to thousands of users around the world.

Kirk has also authored or co-authored several books including Google® Search Complete! (Odyssey Press. 2014) and PROC SQL: Beyond the Basics Using SAS®, Second Edition (SAS Press. 2013); hundreds of papers and articles on a variety of SAS topics; served as an Invited speaker, educator, keynote and section leader at SAS user group conferences and meetings worldwide; and is the recipient of 25 "Best" contributed paper, hands-on workshop (HoW), and poster awards.

Comments and suggestions can be sent to:

Kirk Paul Lafler

SAS® Consultant, Application Developer, Programmer, Data Analyst, Educator and Author

Software Intelligence Corporation

E-mail: KirkLafler@cs.com

LinkedIn: https://www.linkedin.com/in/KirkPaulLafler/

LinkedIn: https://www.linkedin.com/in/Order-of-Magnitude-Analytics/

Twitter: @sasNerd