Importing archived articles from legacy system into EPiServer CMS

Introduction

When you're upgrading an existing website to EPiServer CMS, you often need to import the articles from the old website into the new one. I recently got a task to import more than 60,000 articles into EPi. All articles have a standard set of properties (title, preamble, main body, etc.) and an image that has to be upload to For This Page folder. Articles are exported as XML files and zipped. Our job is to create a tool in admin mode that can extract the zip file, parse the XML files, fetch images from the remote server and import them as EPiServer pages. Sounds fun?

Structuring the content

In this blog post, I'll be using Alloy Tech MVC sample website. Under Start page, we want to have an archive container page, and container pages for year/month.

The first thing we need to do is to create a new page type for archive container page. Under Models / Pages, I'll create a new class called ImportedArticlesContainerPage

using EPiServer.Core;
using EPiServer.DataAnnotations;

namespace EPiServer.Templates.Alloy.Models.Pages
{
    [ContentType(
        GUID = "5a8dc26b-0fa6-490a-b4fb-817d5e2d0511")]
    [AvailableContentTypes(
        Include = new[] { typeof(ContainerPage) })]
    public class ImportedArticlesContainerPage : PageData
    {
    }
}

Next thing we need to do is to modify the StartPage so we can create ImportArticlesContainerPages under Start page. 

Exported articles

All articles are zipped. You can download a sample file from this link.

This is how a sample article looks like:

<?xml version="1.0" encoding="ISO-8859-1"?>
<io>
  <article id="XXX" lastmodified="2010-06-12 11:57:22.0" publishdate="2010-06-12 11:57:22.0">
    <field name="TITLE">Article 1</field>
	<field name="PREAMBLE">Lorem ipsum dolor sit amet, consectetur adipiscing elit.</field>
	<field name="MAINBODY">
		<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
		<p>Pellentesque blandit tincidunt porta.</p>
		<p>Etiam a leo in eros consequat bibendum ut a nibh.</p>
		<p>Aliquam augue orci, vestibulum at sollicitudin at, malesuada non neque.</p>
	</field>
	<field name="ARTICLEIMAGE">http://dummyimage.com/600x400/3978f7/000000.png</field>
  </article>
</io>

Admin tool

Now we want to create a tool in EPi admin mode and give it an EPiServer look and feel.

Under project root, we will create AdminTools / ArticleImporter.aspx

And download SharpZipLib nuget package

Code

ArticleImporter.aspx

<%@ Page Language="c#" Codebehind="ArticleImporter.aspx.cs"
    AutoEventWireup="False" Title="Article Importer"
    Inherits="EPiServer.Templates.Alloy.AdminTools.ArticleImporter" %>
<asp:content contentplaceholderid="MainRegion" runat="server">
    <div class="epi-formArea">
        <div class="epi-buttonContainer">
            <span class="epi-cmsButton">
                <asp:Button runat="server" ID="btnBeginImport"
                            CssClass="epi-cmsButton-text epi-cmsButton-tools epi-cmsButton-Import"
                            OnClick="BeginImportOnClick"
                            Text="Begin Import"/>
            </span>
        </div>
    </div>
</asp:content>

ArticleImporter.aspx.cs

using System;
using System.IO;
using System.Linq;
using System.Net;
using System.Text;
using System.Xml.Linq;
using System.Xml.XPath;
using EPiServer.Core;
using EPiServer.DataAccess;
using EPiServer.Framework.Blobs;
using EPiServer.PlugIn;
using EPiServer.Security;
using EPiServer.ServiceLocation;
using EPiServer.Shell.WebForms;
using EPiServer.Templates.Alloy.Models.Media;
using EPiServer.Templates.Alloy.Models.Pages;
using EPiServer.UI;
using ICSharpCode.SharpZipLib.Zip;

namespace EPiServer.Templates.Alloy.AdminTools
{
    [GuiPlugIn(
        DisplayName = "Article importer",
        Description = "Used to import old articles from XML files",
        Area = PlugInArea.AdminMenu,
        Url = "~/AdminTools/ArticleImporter.aspx")]
    public partial class ArticleImporter : SystemPageBase
    {
        private readonly IContentRepository _contentRepository;
        private readonly ContentAssetHelper _contentAssetHelper;

        public ArticleImporter()
        {
            _contentRepository = ServiceLocator.Current.GetInstance<IContentRepository>();
            _contentAssetHelper = ServiceLocator.Current.GetInstance<ContentAssetHelper>();
        }

        protected override void OnPreInit(EventArgs e)
        {
            base.OnPreInit(e);
            MasterPageFile = ResolveUrlFromUI("MasterPages/EPiServerUI.master");

            SystemMessageContainer.Heading = "Article importer";
            SystemMessageContainer.Description = "Upload the exported_articles.zip file to App_Data folder";
        }

        protected override void OnInit(EventArgs e)
        {
            // we need to make sure that only admins have access to this tool
            if (!PrincipalInfo.HasAdminAccess)
            {
                AccessDenied();
            }

            base.OnInit(e);
        }

        protected void BeginImportOnClick(object sender, EventArgs e)
        {
            // check if exported_articles.zip file exists in App_Data folder
            string zipFilePath = Server.MapPath("~/App_Data/exported_articles.zip");
            if (!File.Exists(zipFilePath))
            {
                SystemMessageContainer.Message = "Zip file is missing";
                SystemMessageContainer.MessageStyle = MessageType.Warning;
                return;
            }

            var importedArticlesContainerPage =
                GetOrCreateContainerPage<ImportedArticlesContainerPage>(ContentReference.StartPage, "Archive");

            using (var zipFile = new ZipFile(zipFilePath))
            {
                foreach (ZipEntry zipEntry in zipFile)
                {
                    using (var zipEntryStream = zipFile.GetInputStream(zipEntry))
                    {
                        var streamReader = new StreamReader(zipEntryStream, Encoding.Default);
                        string content = streamReader.ReadToEnd();

                        if (!string.IsNullOrWhiteSpace(content))
                        {
                            var xmlDocument = XDocument.Parse(content);

                            DateTime publishedDate = DateTime.Today;
                            string title = string.Empty;

                            // get publish date
                            var rootElement = xmlDocument.XPathSelectElement("//io/article");
                            if (rootElement != null)
                            {
                                string publishDate = rootElement.Attribute("publishdate").Value;
                                if (!string.IsNullOrWhiteSpace(publishDate))
                                {
                                    publishedDate = DateTime.Parse(publishDate);
                                }
                            }

                            // get tile
                            var titleElement = xmlDocument.XPathSelectElement("//io/article/field[@name='TITLE']");
                            if (titleElement != null)
                            {
                                title = titleElement.Value;
                            }
							
							// we need to save the page first so we can set the PublishDate and uploaod the image
                            var articlePage = CreateArticlePage(importedArticlesContainerPage.ContentLink,
                                                                publishedDate, title);


                            // get preamble
                            var preambleElement = xmlDocument.XPathSelectElement("//io/article/field[@name='PREAMBLE']");
                            if (preambleElement != null)
                            {
                                articlePage.Preamble = preambleElement.Value;
                            }

                            // get main body
                            var mailBodyElement = xmlDocument.XPathSelectElement("//io/article/field[@name='MAINBODY']");
                            if (mailBodyElement != null)
                            {
                                articlePage.MainBody = new XhtmlString(mailBodyElement.Value);
                            }

                            // get article image
                            var articleImageElement =
                                xmlDocument.XPathSelectElement("//io/article/field[@name='ARTICLEIMAGE']");
                            if (articleImageElement != null)
                            {
                                string imageUrl = articleImageElement.Value;
                                DownloadImage(articlePage, imageUrl);
                            }

                            // publish the page
                            _contentRepository.Save(articlePage, SaveAction.Publish, AccessLevel.NoAccess);
                        }
                    }
                }
            }
        }

        private ImportedArticlePage CreateArticlePage(ContentReference archiveContainerPage, DateTime publishDate,
                                                      string name)
        {
            string year = publishDate.Year.ToString("0000");
            string month = publishDate.Month.ToString("00");

            var yearContainerPage = GetOrCreateContainerPage<ContainerPage>(archiveContainerPage, year);
            var monthContainerPage = GetOrCreateContainerPage<ContainerPage>(yearContainerPage.ContentLink, month);

            var articlePage = _contentRepository.GetDefault<ImportedArticlePage>(monthContainerPage.ContentLink);
            articlePage.Name = name;

            _contentRepository.Save(articlePage, SaveAction.Save, AccessLevel.NoAccess);

            aticlePage.Title = name;
			articlePage.StartPublish = publishDate;

            return articlePage;
        }

        private T GetOrCreateContainerPage<T>(ContentReference parentPage, string name) where T : PageData
        {
            var containerPage = _contentRepository
                .GetChildren<T>(parentPage)
                .FirstOrDefault(x => x.Name == name);

            if (containerPage == null)
            {
                containerPage = _contentRepository.GetDefault<T>(parentPage);
                containerPage.Name = name;

                _contentRepository.Save(containerPage, SaveAction.Publish, AccessLevel.NoAccess);
            }

            return containerPage;
        }

        private void DownloadImage(ImportedArticlePage articlePage, string imageUrl)
        {
            if (string.IsNullOrEmpty(imageUrl))
            {
                return;
            }

            try
            {
                // get an existing content asset folder or create a new one
                var assetsFolder = _contentAssetHelper.GetOrCreateAssetFolder(articlePage.ContentLink);

                // parse the image name and extension from articleForImport.ImageUrl
                int lastIndex = imageUrl.LastIndexOf('/') + 1;
                string imageName = imageUrl.Substring(lastIndex, imageUrl.Length - lastIndex);
                lastIndex = imageName.LastIndexOf('.');
                string imageExtension = imageName.Substring(lastIndex, imageName.Length - lastIndex);

                var blobFactory = ServiceLocator.Current.GetInstance<BlobFactory>();
                var imageFile = _contentRepository.GetDefault<ImageFile>(assetsFolder.ContentLink);
                imageFile.Name = imageName;

                var webRequest = (HttpWebRequest)WebRequest.Create(imageUrl);
                var webResponse = (HttpWebResponse)webRequest.GetResponse();

                // Check that the remote file was found. The ContentType
                // check is performed since a request for a non-existent
                // image file might be redirected to a 404-page, which would
                // yield the StatusCode "OK", even though the image was not
                // found.
                if ((webResponse.StatusCode == HttpStatusCode.OK ||
                     webResponse.StatusCode == HttpStatusCode.Moved ||
                     webResponse.StatusCode == HttpStatusCode.Redirect) &&
                    webResponse.ContentType.StartsWith("image", StringComparison.OrdinalIgnoreCase))
                {
                    // upload the image
                    using (var imageStream = webResponse.GetResponseStream())
                    {
                        var blob = blobFactory.CreateBlob(imageFile.BinaryDataContainer, imageExtension);
                        blob.Write(imageStream);
                        imageFile.BinaryData = blob;

                        articlePage.ArticleImage = _contentRepository.Save(imageFile, SaveAction.Publish,
                                                                           AccessLevel.NoAccess);
                    }
                }
            }
            catch
            {
                // TODO: log
            }
        }
    }
}

 

Running the tool

Copy exported_articles.zip to wwwroot/App_Data folder, log in to admin mode and run the tool.

If everything went ok, you should see the articles under Archive folder

And images inside For This Page folder

comments powered by Disqus