September 22, 2011

Content Syndicator

The CyberSEO plugin is able to syndicate almost any content source. Basically there are two content source types:

1) RSS/Atom, XML, JSON feeds and HTML documents.

2) CSV files and raw text dumps. Here you have to select the dump format and the desired post structure.

To Syndicate a new content source, choose the appropriate window (feed or CSV), anther the feed URL (fo the beginners it is recommended to use presets), paste the CSV, text dump or its URL and submit it to CyberSEO. Right after that you will be able to adjust the content source settings. All supported sources are processed as XML feeds (even raw text dumps), so they have the same options.

Syndicate this feed to the following categories

Here you can select the existing pre-created WordPress categories where you want assign all the syndicated posts to.

Extract full articles

When this mode is enabled, the CyberSEO plugin will try to automatically extract full-text articles from the shortened RSS feed. You can freely downloaded the full-text-rss script from here or from here, unzip it and upload the files into the /wp-content/ or /wp-content/plugins/cyberseo/ folder. After that the full text extraction feature will be enabled and the “Extract full articles” option will become available.

Proxy mode

This option allows one to enable or disable proxies from the list, specified in General Settings. The proxies are usually used to parse those services that do not allow frequent connections from the same IP’s.

Attribute all posts to the following user

This option allows you to choose the author (registered WordPress user) for all posts that will be syndicated from the selected XML/RSS feed.

XML section tag names (separate with commas)

This is a very important option for syndicating and parsing of any unconditioned XML feeds. As you may know, the content of every RSS entry is enclosed within the <item> tags. E.g.:

<item>
    <title>Example entry</title>
    <description>Here is some text containing an interesting description.</description>
    <link>http://www.wikipedia.org/</link>
    <guid>unique string per item</guid>
    <pubDate>Mon, 06 Sep 2009 16:45:00 +0000 </pubDate>
</item>

Thus any content automation plugin is able to properly recognize and parse the RSS feeds, since they have a very formalized structure. But if there is a need to parse an unconditioned XML feed, you may face a serious problem because this task can’t be done by other existing content automation plugins. Fortunately this is not a problem for CyberSEO.

Let’s consider we need to parse some XML feed which entries are enclosed within the <product> tags and have the following structure:

<product>
    <id>PRODUCT-73182373</id>
    <name>DuroStar DS7200Q Remote Control Silent Diesel Generator</name>
    <description><img url="http://static1.ostkcdn.com/images/products/P12338714.jpg"><br />Perfect for the job site or the residential user who needs back-up power Features: Electric start on panel plus remote start. Advanced direct fuel injection system for low fuel consumption. Dependable, maintenance free alternator with automatic voltage regulator. Full power panel with keyed ignition, oil alert automatic shutdown for low oil pressure, volt meter, circuit breakers, low fuel indicator and power outlets. 12 hours Continuous Operating Capability. Fully protected 12V, 120V & 220V outlets. Ground Fault Interrupter. DC 12V charging system.
    </description>
    <price>$1,000</price>
</product>

The first thing we need to do is to learn the CyberSEO plugin to recognize the new enclosing tags. To do so, we need to go to the “XML Syndicator” menu and click the “Alter default settings” button. Now we have to locate the “XML section tag names (separate with commas)” field, add there the new enclosure tag (e.g. PRODUCT) and click “Update Default Settings”. Done! Now the CyberSEO plugin will be able to recognize XML feeds where all items are enclosed within the <product> tag.

However we still can’t pull such feeds to the blog without some additional settings, because there is only one standard RSS tag int the example feed above. I’m talking about the <description> one. But there is no such important tags as <title> and <guid>. Furthermore, there are some additional tags (<name>, <id> and <price>) we may like to include into the post.

To solve this problem, we’ll use the “Custom Fields” and “PHP Code <?php .. ?>” options (find their detailed descriptions below). So let’s assume that we already added the <product> tag into the list of recognized ones via the “XML section tag names” text field in the “XML Syndicator – Default Settings” page, saved the changes and added the new XML feed using the “New Feed URL:” box.

Thus we have to create post title and GUID using the data from feed. Apparently we can use the value of the <id> tag as a post GUID and the value of <name> as a post title. We also take the value of the <price> tag and place it below the product description.

First of all, we have to insert the following lines into the “Custom Fields” box:

id->guid
name->post_title
price->price

This will tell CyberSEO that the value of <id> must be assigned to the custom field “guid”, the value of <name> must be assigned to the custom field “post_title” and the value of <price> must be assigned to the custom field “price”.

On the second step, we need to assign these custom field values to the appropriate elements of the syndicating post. We can do it by placing the following code into the “PHP Code <?php .. ?>” box:

$post['guid'] = $post['custom_fields']['guid'];
$post['post_title'] = $post['custom_fields']['post_title'];
$post['post_excerpt'] = $post['post_excerpt'] . "<br /><b>Price:</b> " . $post['custom_fields']['price'];

Now we just need to click “Update Feed Settings” and voila – we are ready to automatically parse this unconditioned XML feed as a source of new posts for our blog!

Undefined categories

This option defines what the XML/RSS syndicator must do with the posts which categories are not per-defined in the blog. Here you can choose one of the four actions:

  • Use XML syndicator default settings – the default XML syndicator rules will be applied. You can assign these rules if you click “Alter default settings” button on the main “XML Syndicator” page.
  • Post to default WordPress category – the syndicating post will be assigned to the default category. No additional categories will be created.
  • Create new categories defined in syndicating post – all previously undefined categories, specified by the <categories> tag of the syndicating post, will be automatically created.
  • Do not syndicate post that doesn’t match at least one category defined above – if the syndicating post doesn’t meet at least one of the predefined categories, it will not be added to blog.

Create tags from category names

When this option is enabled, the CyberSEO plugin will automatically create WordPress tags from the categories, specified by the <categories> tag of the syndicating post.

Post tags

This text field allows you to create your own tags and assign them to all posts that will be syndicated from the selected feed.

Check for duplicate posts by

Use this option to define the way CyberSEO will use to detect and ignore the duplicated posts:

  • GUID and title – the post will not be added in case if either its GUID or title match one of the existing posts.
  • GUID only – the post will not be added in case if its GUID match one of the existing posts.
  • Title only – the post will not be added in case if its title match one of the existing posts.

Check this feed for updates every…

Use this option to set the time interval in minutes which defines on how often CyberSEO should check the selected feed for updates. If you don’t want to check the feed for updates automatically, simple set this value to 0.

Maximum number of posts to be syndicated from each feed at once

Here you can set the maximum number of posts that can be syndicated from the selected feed, every time when it’s checked by plugin for the updates. It is recommended to use low values to avoid the server overload. Also it will make your self-populating site look like an organically growing one. Search engines don’t like those “blogs” that add 100 new posts at once…

Posts Status

This drop-down menu allows you to define the status of syndicated posts. There are four options that barely need any additional description:

  • Hold for review
  • Save as draft
  • Save as private

Comments

Use this option to enable or disable comments on the syndicating posts.

Pings

This option defines the ping status for the syndicated posts. You may either allow pings or forbid them.

Base date

Here you can choose the base date for the syndicating posts. If you select “Get date from post”, all new posts will be added with the same dates as specified in the RSS/XML feeds (in case if the date is really specified there). But if you select “Use syndication date”, the new posts will be stamped with the actual date of their syndication.

Media Attachments

Some RSS feeds have so-called media attachments (usually thumbnail images) enclosed within <media:content>, <media:thumbnail> and <enclosure> (only the “image” type is supported) tags. The drop-down menu let’s you decide on what to do with these types of attachments:

  • Do not insert attachments – the attachment will be ignored and won’t be inserted into the post.
  • Insert attachments at the top of the post – the attachment will be placed at the top of the post.
  • Insert attachments at the bottom of the post – the attachment will be placed at the bottom of the post.

In case if you decide to insert the attachments, they will will contain in the <img> tag, so you will be able assign them some custom style.

Post links handling

Sometimes you may want to automatically remove a certain type of links from the syndicating posts. The “Post link handling” option allows you to do one of the following operations:

  • Keep links intact
  • Remove all links
  • Remove all links except links in images
  • Remove links from images only

Post thumbnail

When “Generate from the first post image” option selected, the CyberSEO plugin will generate post thumbnails (also known as featured images) using one of the following sources:

  • first image found in the post content or excerpt;
  • random image found in the post content or excerpt;
  • media attachment thumbnail;
  • contents of the “thumb” custom field.

If the plugin will not be able to create post thumbnail (e.g. a source image is missing or broken), the post will be deleted. So if you are going to enable the post thumbnail generation, please make sure that images are available in your content source.

Morph Headers and Footers

Use this option to enable or disable morphing/synonymizing of the headers and footers (see below) before their insertion into the syndicating posts.

Don’t Morph Titles

Use this option to allow or disallow synonymizing of the syndicating post titles.

Shuffle Paragraphs

Enable this option if you want to shuffle paragraphs of the syndicating posts.

Parse WordPress archives

If enabled and if the syndicating RSS feed was generated by another WordPress blog, the CyberSEO Suite will pull all the published posts from that blog, but not only those that are available in that feed. The plugin will parse through the WordPress archives, discover and aggregate every single post available there – doesn’t matter now many years the blog runs and how many posts it contains. Just enable the “Parse WordPress archives” and CyberSEO will grab the whole blog at once or do it post-by-post every given time interval. In other words, if some target WordPress blog has 1,000 posts on the board and there are only 10 recent ones in its RSS feed, the CyberSEO plugin will aggregate all 1,000.

This is an unique and very powerful function which is beyond the possibilities of other popular content automation scripts. So please use it wisely and DO NOT STEAL copyrighted content form other blogs without their owner’s permission! Otherwise you may face serious legal problems.

UTF-8 Encoding

This option converts an ISO-8859-1 string into UTF-8 that may be required when parsing the XML/RSS feeds containing illegal UTF-8 start bytes (e.g. <0x92>). Use it to aggregate even invalid feeds, that can not be parsed by other XML/RSS syndicators.

Convert Character Encoding

Enable this function if the selected feed contains is being delivered in some national character set (e.g. windows-1251) to automatically convert it into UTF-8.

Store Images Locally

If enabled, all images from the syndicating feeds will be copied into the default uploads folder of your blog. Make sure that your “/wp-content/uploads” folder is writable. This my speed up your blog’s loading time and also will let you to syndicate the posts containing hotlink-protected images.

Post Date Adjustment Range

Here you can set the syndication date adjustment range in minutes.
This range will be used to randomly adjust the publication date for every aggregated post. For example, if you set the adjustment range as [-60] .. [60], the post dates will be randomly spread between -60 and +60 minutes.

Custom Fields

Here you can assign the XML tag values to the custom fields of the syndicating post, using the following format:
tag->name – one line per field.
Where:
tag – the aggregating post’s tag you want to parse. This parameter is case insensitive;
name – the name of WordPress post custom field to write the value it. This parameter is case sensitive.

The example below, will store the values of <name> and <isbn> tags into the “book_title” and “book_isbn” custom fields:

name->book_title
isbn->book_isbn

Storing of the XML feeds as custom fields is necessary for syndicating of the unconditioned XML feeds (find an example in the description for “XML section tag names” field above).

Post Title, Post Content and Post Excerpt

CyberSEO ver. 7 allows one to define HTML templates for post title, post content and post excerpt. You can use these text boxes to define the layout and contents of the posts, generated by the plugin. The following predefined tags are available:

  • %post_title% – post title;
  • %post_content% – post content;
  • %post_excerpt% – post excerpt;
  • %link% – a link to source;
  • %post_guid% – post GUID;
  • %media_description% – post media description (if media attachments are included);
  • %enclosure_url% – the enclosure URL (if attachment enclosure is included);
  • %xml_tags[name]% – an XML tag value, where name must be replaced with the actual XML tag name;
  • %custom_fields[name]% – a custom field value, where name must be replaced with the actual custom field name;
  • %custom_fields_attr[name][attr]% – a custom field attribute, where name must be replaced by the actual custom field name and attr must be replaced by the XML attribute name;
  • %media_thumbnail[n]% – a link to the media thumbnail where n is its integer index (if media attachments are included).

E.g. if you want to add some text to every post title (say “Breaking news:”), you should alter the “Post title” template like this:

Breaking news: %post_title%

Here is another example. Let’s say you are going to import a product XML feed which has the following items:

<product_name>the product name here</product_name>
<picture>the product picture URL here</picture>
<price>the product price here</price>

The only thing you have to do is to simply define your “Post template” like this:

<h3>%xml_tags[product_name]%</h3>
<image src="%xml_tags[picture]%" />
<p><b>Price:</b> %xml_tags[price]%</p>

Post Headers and Post Footers

Everything that you put into these boxes will be added to every syndicated post at it’s top or bottom accordingly. You can use the post headers and footers to add some random text or HTML code to every syndicating post. In this case, each chunk of text must be separated with the “<!–more–>” marker. For example, if you put the following text into the “Post Footers” box, every syndicated post will be amplified with a random Mark Twain’s quote:

I was gratified to be able to answer promptly. I said I don't know.
<!--more-->
If you pick up a starving dog and make him prosperous, he will not bite you. This is the principal difference between a dog and a man.
<!--more-->
If you tell the truth you don't have to remember anything.
<!--more-->
In Paris they simply stared when I spoke to them in French; I never did succeed in making those idiots understand their language.
<!--more-->
In religion and politics, people's beliefs and convictions are in almost every case gotten at second hand, and without examination.
<!--more-->
In the first place, God made idiots. That was for practice. Then he made school boards.
<!--more-->
It could probably be shown by facts and figures that there is no distinctly American criminal class except Congress.
<!--more-->
It is easier to stay out than get out.

You can also use the predefined tags (see Post Title, Post Content and Post Excerpt), which available for post title, post content and post excerpt templates. For example, if you want to include a link to source at the bottom of every generated post, you can do it by inserting the following line into the “Post footers” box:

<p><a href="%link%">Source</a></p>

PHP Code <?php .. ?>

This is the most powerful tool which is intended for advanced users who are familiar with PHP. Use it to execute your own PHP code for every aggregating post, which info is represented as the $post array variable. You can alter this variable in order to apply your own changes to every aggregating post (find an example in the description for “XML section tag names” field above). The following items are defined in the $post array variable:

  • $post[‘post_title’] – post title;
  • $post[‘post_name’] – post name;
  • $post[‘link’] – link;
  • $post[‘post_content’] – post content;
  • $post[‘post_excerpt’] – post excerpt;
  • $post[‘guid’] – GUID;
  • $post[‘post_author’] – post author;
  • $post[‘post_date’] – post date in Unix timestamp format;
  • $post[‘categories’] – array of post category names;
  • $post[‘media_content’] – array of media content URLs;
  • $post[‘media_thumbnail’] – an array of media thumbnail URLs;
  • $post[‘enclosure_url’] – enclosure URL;
  • $post[‘enclosure_type’] – type of enclosure URL. E.g.: audio/mpeg;
  • $post[‘custom_fields’] – array of custom fields;
  • $post[enclosure_type’] – array of XML tag attributes for corresponding custom fields;
  • $post[‘comments’] – the array may contain post comments in WordPress format.
  • $post[‘custom_fields’] – array of post custom fields;
  • $post[‘custom_fields_attr’] – array of XML attributes associated with the post custom fields;
  • $post[‘tags_input’] – an empty array. You can use it to assign additional tags to syndicating posts via the $post[‘tags_input’] variable. E.g.:
    $post['tags_input'] = array('tag1', 'tag2', 'tag3', 'etc');

If you modify the $post array (say change the post tittle, insert some text into the post’s body, or create your won tags according to the post’s content), your changes will be applied to the post BEFORE it will be processed by CyberSEO and added to the blog. If you don’t want some particular post to be syndicated (say it’s body contains some undesirable words etc), your PHP code must assign “false” to the $post variable.