Introduction to HTML
These are notes on website design for The Australian National University course "Networked Information Systems" (COMP2410 / COMP6340) in 2009. This section of the course is prepared and presented by Tom Worthington FACS HLM.
HTML (Hypertext Markup Language) is the most common mark-up language used for web pages. XHTML is based on HTML, but using the a stricter syntax.
The World Wide Web Consortium (W3C) has released several versions of HTML and XHTML. Older versions of HTML include HTML 2.0 (1995) and HTML 3.2 (1997). Newer versions supported by current web browsers include HTML 4.01 (1999), XHTML 1.0 (2000) XHTML Basic 1.1 (2008) is a subset of XHTML designed for devices such as mobile phones.
HTML documents should conform to a particular W3C Recommendation (standard) and validate. Mark up of elements should be semantic: descriptive and meaningful to a human reader, as well as a computer.
HTML Structure
A HTML document has:
- declaration of the version of HTML: document type (DOCTYPE) declaration
- header about the document: head element must include a title.
- body containing the content to display.
Text Elements
Text elements are used to structure text in the web page. Block elements make up the main document structure: headings, paragraphs, block quotes, pre-formatted text, lists and div.
HTML has six levels of headings to introduce sections of text. By default headings are rendered bold in decreasing size, from h1 to h6. Later versions of HTML require headings to be in order (h1, h2, h3 ...):
<h1>Main heading</h1> <h2>Sub heading</h2>
Paragraphs are indicated by the <p> element. Paragraphs cannot contain block elements, such as headings or paragraphs. Early versions of HTML allowed the end </p> tag to be omitted, but XHTML does not.
The <blockquote> element is used for long quotations. Block quotes usually contain paragraphs and may contain headings and lists. Block quotes are usually rendered indented.
Preformatted text <pre> preserves white space (character spaces and line breaks). This is used where spacing is important for meaning, such as in computer source code. Preformatted text is usually rendered as a fixed-width font.
Inline Elements
Inline elements span characters within the flow of text, by default they do not add line breaks. Commonly used logical elements in later versions of HTML are:
- em: for emphasised text, usually rendered in italics: <em>emphasised text</em>
- strong: for strongly emphasised text, usually rendered in bold: <strong>strong text</strong></li>
Div and Span
div and span elements provide a way to mark-up the document with no default rendering:
- div element identifies a block-level division of text:
<div id="header"><p>Heading for the top of the page.</p></div>
- span element identifies an inline element:
<p>Identify <span id="key">a key point</span>.</p>
Element Identifiers
id and class attributes are used to provide specific meaning for div and span elements. They can also be used with other elements.
- id attribute gives a unique name to an element. These are usually used to select a style sheet, or as a target anchor for links.
- class attribute groups similar elements. These are usually used to select a style sheet.
Lists
Unordered lists use the ul element and Ordered lists the ol element. Both use the li element for each item of the list. By default each entry displays on a new line, prefixed by a bullet for unordered lists and number for ordered lists. Ordered and unordered lists may be nested. By default different bullet shapes are used for levels of nested unordered lists. Style sheets can be used to change the shape of the bullets, the numbering system used for ordered lists and restart the numbering sequence.
<ul>
<li>Item</li>
<li>Another item</li>
</ul>
Character Entity References
Character references are used to insert special characters into HTML. Escaped characters are indicated by a Numeric Character Reference (NCR) or predefined character entity name between & and ;. Common character entities are < and > for the less-than < and greater than > symbols, & for Ampersand &.
Hypertext Links
A link can be created from HTML to any web resource, including another HTML document, or an element within the document. The anchor element a identifies text or an image that links to another document. The href attribute provides the pathname (URL) to link to. Linked text is usually rendered in blue underlined and images with a blue border. This can be changed with style sheets.
<a href="http://wattlecourses.anu.edu.au/">ANU Wattle</a>
Absolute URLs have a protocol identifier, a hostname, and path to the specific file name. Usually HyperText Transfer Protocol (http) is used for web pages.
Relative URLs are relative to the location of the current document and usually used for links within the same document or web site.
<a href="wattle.html">ANU Wattle</a>
Later versions of HTML use the id attribute on any element to provide a target with a document for a link. Earlier versions use name in the a element. These identifiers are placed after a hash (#) symbol appended to the URL to make the link.
<a id="details" href="wattle.html#timetable">ANU Wattle Timetable</a>
Windows
The target attribute in the a element allows the linked document to be opened in a new window by the web browser. Using "_blank" for the target attribute opens a new browser window every time, using another name will allow the second window to be reused. Target should be used sparingly to avoid confusing the user with numerous windows.
Images and Objects
Web pages can include images and media objects, such as video, Java applets and other HTML documents. Inline images occur in the normal flow of the document's content, with the img element. Web images are usually in GIF, JPEG, or PNG format, named .gif, .jpeg or .jpg, or .png. Inline images can be used as an alternative to text links.
<img src="http://anu.edu.au/logo.gif" alt="ANU" />
Decorative images which are not part of the information content of the document should not be inline images and can be replaced using an external style sheet. The img element requires the src attribute to provide the URL of the image file, and alt to provide alternative text if the image cannot be displayed (or the user cannot see images). An alt attribute with an empty string (alt="") should be used for decorative images which do not convey any information.
By default, the bottom of an image aligns with the baseline of surrounding text. This can be changed with style sheets, or with the width and height attributes. The browser will resize the image to match the specified dimensions. However, resizing the image larger results in a poor quality pixelated, image and resizing it smaller results in unnecessarily large file download.
Embedded Media
Embedded media files, such as video, Flash and Java applets can be included in web pages. The browser renders using provided code or by taking advantage of a plug-in application. The object element is recommended for later versions of HTML but applet may be needed for earlier versions. The non-standard embed may be needed for some implementations.
Object can be used for applets, movie and interactive objects (Flash). It may contain param elements to pass parameters to the object. Three types of information may be needed by the object:
- classid attribute: to identify the executable code, or player, such as the QuickTime plug-in for .mov file).
- data attribute: to give the URL of the data file
- param elements: Additional setting, such as an "AutoStart" feature.
The noembed element provides alternative content if the browser cannot display the media file. As an example text and a still image could be displayed if a movie cannot be played.
Tables
Table elements present rows and columns of tabular data. The table element defines a table, with tr for table rows and td for creating table cells. A table cell can contain formatted text, images, multimedia elements, and other tables.
<table><tbody> <tr> <td>row 1 column 1</td> <td>row 1 column 2</td> <td>row 1 column 3</td> </tr> <tr> <td>row 2 column 1</td> <td>row 2 column 2</td> <td>row 2 column 3</td> </tr></tbody> </table>
Colspan and rowspan allow a table cell to occupy more than one cell space. Table header cells th provide descriptions of the cells in the row or column that they precede. The caption element gives a caption for the table. Table headers and captions are important for accessibility.
Rows to the table can be optionally organised into header (thead), footer (tfoot), and a table body (tbody). The tfoot element should be before tbody in the mark-up, although it will be normally rendered after the content of the tbody.
Accessible Tables
Sight-impaired users have difficulty reading tables. Tables can be made more accessible by providing a short descriptive caption element and a longer summary attribute. The summary attribute is not usually rendered by visual browsers.
Table headers (th) provide a description for a column or row. The abbr attribute can be used to provide an alternate short version of the header title. The scope attribute can be used to declare associations between table headers and rows or columns in complex tables.
Frames
Frames allow the browser window to be divided into sub-windows, each with a different HTML document.
A frameset document defines the frames. It uses the frameset element instead of body. It defines and names to frames, arranged in rows and columns, each with a frame element. The noframes element, after the frameset provides alternative content for web browsers which cannot display frames and for web search engines.
Some versions of HTML do not permit frames and they may result in overly complex designs. There are not discussed further here.
Forms
Forms provide for data entry and user interaction with the web page. The form element defines a form. The form may contain text, images and tables. The action attribute in the form element provides the URL of the program to be used for processing the form. The name attributes of form control elements provide the variable names.
The method attribute specifies either get or post, for submitting the form information to the server. The simpler get method transfers the data appended to the action URL with a question mark. The post method transmits the data separately and allows for encryption and for more data to be sent.
Now read: Web Design in a Nutshell, Jennifer Niederst Robbins, O'Reilly Media, Inc. 2006:
- Chapter 8. HTML and XHTML Overview
- Chapter 9. Document Structure
- Chapter 10. Text Elements
- Chapter 11. Creating Links
- Chapter 12. Images and Objects
- Chapter 13. Tables
- Chapter 14. Frames
- Chapter 15. Forms