Blog Stats
  • Posts - 151
  • Articles - 0
  • Comments - 40
  • Trackbacks - 3

 

Thursday, December 25, 2008

Converting a Html News Page to RSS

This have been my little weekend project this weekend (ok, some Christmas preparations to).

Background

As some of you know I play underwater rugby. The communication from the Swedish Underwater Rugby Association to it’s members is mainly through the news page on their site’s news page:  http://www.ssdf.se/t3.aspx?p=51459 (in Swedish). This page is only exposing HTML and does not expose an rss-feed (they should have used SharePoint). My problem is that I never remember to visit the site with regular intervals, so I miss out on stuff.

Approach

As I truly am a RSS junkie, that’s what I wanted. To be able to get these news (together with all other news I’m interested in) in my feed reader. So the approach I took outlined:

  1. Get HTML from newspage
  2. Make sense out of and parse HTML
  3. Generate RSS XML and save to file
  4. Expose the RSS on my own web server

Doing all the parsing by hand didn’t sound very tempting so I Lived around a little and found the HtmlAgilityPack on CodePlex, which is a framework that let’s you query a HTML document in the same way you would query a XML document using XSLT or XPath. The release on codeplex was compiled against the 2.0 framework, I simply changed target framework for the project an recompiled, worked like a charm.

 

Let’s Get Going

Getting the HTML

The HtmlAgilityPack supports getting the HTML itself by using something like:

HtmlWeb hw = new HtmlWeb();
HtmlDocument newsDoc = hw.Load(url);

The problem I had with that (and it’s probably due to incompetence on my part) is that I could not get the right encoding (very important in Swedish due to our extended alphabet). So what I ended up doing was getting the HTML myself and load it into a HtmlAgilityPack HtmlDocument object:

// Did not manage to solve the encoding bit so I retrive the data myself first ...
HttpWebRequest webRequest = (HttpWebRequest) WebRequest.Create(urlToFetch);
HttpWebResponse webResponse = (HttpWebResponse) webRequest.GetResponse();

// ... and then apply the encoding while reading in the stream into HtmlAgilityPack object
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.Load(webResponse.GetResponseStream(), Encoding.Default);

 

Parsing the HTML

Now it’s time to leverage the power of the HtmlAgilityPack, but first I did a manual analysis of the HTML using View Source and the IE Developer Toolbar. I found that I could identify each a news item by looking for a DIV-tag with the class attribute set to clMainnewsEntries.

Html2Rss_IEDevToolbar

So, let’s get cracking and find those nodes:

HtmlNodeCollection htmlNodeCollection = htmlDocument.DocumentNode.SelectNodes("//div[@class='clMainnewsEntries']");

foreach (HtmlNode newsNode in htmlNodeCollection)
{
    // ... generate rss items ...
}

That was easy, now the HTML stars working against me. A few issues are

No Author

The news have no author, that I easily can get to programmatically. But it aint really important either so I’m just setting it to “N/A”.

No Links

Not all news have links and if they do it’s hard to tell if it’s a link to the news item or something else. So to fill the link-element in the rss I try to find a link that has title tag (which seems to be the way this cms system handles read more links.

string link = string.Empty;

if (newsNode.SelectSingleNode(".//a[@title!='']") != null)
{
    link = newsNode.SelectSingleNode(".//a[@title!='']").Attributes["href"].Value;
    if (!link.StartsWith("http"))
    {
        link = String.Format("{0}{1}", "http://www.ssdf.se/", link);
    }
}

No Publishing Date

This one is trickier. To add on the confusion I learned the editors update a news item when they want to push it to the top of the list. So what I do here is simply put the date and time when I retrieve it the first time, keeping track of them with a hash (see next paragraph). This should work fine when it runs with a steady intervall, tough the first time it will give all news the same date.

Guid

To keep track of the items I calcluate a hash for each item and store that in a separate XML file.

public string ComputeHash(string Value)
{
    System.Security.Cryptography.MD5CryptoServiceProvider x = 
        new System.Security.Cryptography.MD5CryptoServiceProvider();
    byte[] data = System.Text.Encoding.ASCII.GetBytes(Value);
    data = x.ComputeHash(data);
    string ret = "";
    for (int i = 0; i < data.Length; i++)
    {
        ret += data[i].ToString("x2").ToLower();
    }
    return ret;
}

I put this hash in the guid-tag of the RSS. So if the news is updated I hope they change something in it so it renders a different hash.

 

Building the RSS

It’s time to start building the RSS. I start creating the document using LinqToXml (which by the way is pure love to use and deserves a blog post all of it’s own):

// Creating XDocument
XDocument xDocument = new XDocument(
    new XDeclaration("1.0", "windows-1252", "true"),
    new XProcessingInstruction("xml-stylesheet", "type=\"text/xsl\" href=\"EvelntLog.xsl\"" ),
    new XElement("rss", new XAttribute("version", "2.0"),
         new XElement("channel",
                      new XElement("title", "UV-rugbynyheter"),
                      new XElement("link", HtmlDocument.HtmlEncode( "http://www.ssdf.se/t3.aspx?p=51459") ),
                      new XElement("description", "Undervattensrugbynyheter från SSDF"),
                      new XElement("language", "sv-se")
             )
        )
    );

And then I add each item to item to the feed:

root.Add(new XElement("item", "",
      new XElement("title", newsNode.SelectSingleNode("h1").InnerHtml),
      new XElement("description", newsNode.OuterHtml),
      new XElement("link", link),
      new XElement("author", "N/A"),
      new XElement("pubDate", pubDate.ToString("r")),
      new XElement("guid", postHash)

));

 

Finalizing

I put this little program on my web server and used the windows scheduler to run it every 2 hours. And the final piece of code pushes the generated file out to the right directory.

if(Convert.ToBoolean(ConfigurationManager.AppSettings["CopyFile"]))
{
    File.Copy(".\\" + ConfigurationManager.AppSettings["Filename"], 
        Path.Combine(ConfigurationManager.AppSettings["TargetDir"], 
        ConfigurationManager.AppSettings["Filename"]), true);
}

You can grab the source code for the first working version here. Now it’s refactor time!

Tuesday, December 16, 2008

Getting into Twitter

twitter_logo_sEverybody is talking about twitter, tweet this and tweeting that. And I haven’t really got the point yet. So this Saturday during a Chili cook out my friends Mårten and Johan tried to explain it to me.

They did a job good enough to make me give it a try anyway. So I’m now on http://twitter.com/nippe

I haven’t said a twittering word yet, I think I’m going to listen for a while first to get the gist of it.witty

So I got an account what’s the next thing you do. See if you can find some cool apps to get you on your way, of course. I found Witty which seems to be a nice open source, WPF app. And hey, glossy buttons always makes adoption easier :).

Sunday, December 14, 2008

Adobe PDF iFilter now in 64-bit

Finally! Adobe released a 64 bit version of the PDF iFilter. You can get it here: http://www.adobe.com/support/downloads/detail.jsp?ftpID=4025.

But before you decide if this is the iFilter you wanna go with. Check out Je Li’s performance measurements comparing Adobes and Foxit’s iFilter: http://blogs.msdn.com/opal/archive/2008/12/10/pdf-ifilter-battle-foxit-vs-adobe-64bit-version.aspx

Friday, November 07, 2008

3 Weeks with Windows Server 2008 on My Laptop

logo-ms-ws08-v A couple of weeks ago I felt that it was time to give my laptop (Dell Latitude D830) a clean install. I’ve been running Vista 64-bit on it and it has worked great. I’ve also been curious at Hyper-V and not totally satisfied with Virtual PC (the vm’s are often to slow).

Said and done, I installed Windows Server 2008 Standard Edition and got on to configuring it and installing features and roles. That WS2008 is not a client OS is a given, but the amount of configuration to get it to work like one was a little over the top for my taste. But hey, what don’t you go trough for learning something new. There is a ton of articles out there describing the steps, here’s a few that I followed:

After getting WiFi, Aero and sound working I struggled with Bluetooth which I didn’t get fully working which is a big setback because I use (and love) a bluetooth mouse.

Next step for me was installing the Role Hyper-V. Worked great, took me a while to figure out the how the networking worked and that it don’t work with wireless as Virtual PC. The big drawback for me was that when Hyper-V is installed it disables the laptops ability to Sleep & Hibernate and I just love to be able to slam down the lid, go some place and open it up and continue where I was.

So overall I’m happy with it but the bluetooth and Sleep/Hibernate issues are deal breakers for me, so I’m going back to Vista 64. Should I’ve had a stationary computer I would probably stick with Windows Server 2008 because Hyper-V really rocks!

Thursday, November 06, 2008

Shortcuts that Saves Time

Hi there, this post is about being a little more efficient in your everyday work. These shortcuts maybe saves me half a sec every time I use them, but I do use them literally a hundred times a day. 

 

Ctrl + E

Using the Ctrl + E keyboard short cut gets you to the search box in many major applications, such as: Internet Explorer, Mozilla FireFox and Outlook 2007.

Some examples:

Outlook
Ctrl + E in Outlook

Internet Explorer
Ctrl + E in Internet Explorer

FireFox
Ctrl + E in FireFox

 

ALT + D

A not so known shortcut is ALT + D that takes you to the address field of your browser (I only verified this in IE and FF). A nice effect is that it selects the whole address, so ALT+D and start typing and you’re entering a new address.

Internet Explorer
ShortCuts_AltD_ie

FireFox
ShortCuts_AltD_FF

 

Do you have any productivity boosting short cuts?

Wednesday, November 05, 2008

A Sweet Little File Rename Utility

Every now and then I want to rename files in a folder according to a pattern. Mostly it is to hook up audio books in my iPod. Then I have to rename all the files with .m4a extension to .m4b (and don’t ask me why apple did choose this idiotic solution).

This is easily done with a dos command in windows (ren *.m4a *.m4b). But when you want to change file names according to a pattern it gets trickier. I stumbled across this sometimes when I want to rename digital photos and such.

Anyway, I found this utility via lifehacker and it’s called KRename. I’ve just played around with it a little but for the things I needed off the bat it came trough.

KRename

Tuesday, October 14, 2008

Consolas Font in Command Prompt

I am a font geek and I just love the Consolas font. I started using Consolas in Visual Studio and now I want it in more places. So I sat out on a mission to get Consolas into my Command Prompt.

Here’s how:

  1. Get Consolas Font Pack and install it
  2. Crank up regedit and browse to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Console\TrueTypeFont
  3. Add a string value named “00”
  4. Add “Consolas” as value
    ConsolasPost_Regedit
  5. Restart your computer

And voila, consolas in the command prompt:

Before

ConsolasPost_CmdBefore

After

ConsolasFont_CmdAfter

Thursday, August 28, 2008

Getting a "403 Forbidden" when trying to access the Search Settings Page

A while ago I stumbled upon this issue. Doing my usual day-to-day stuff on the SharePoint farm, I entered the the Search Settings page under the SSP and instantly got a 403 Forbidden in my face. We had not experienced any problems with this earlier. So I looked around a little and the Profile Import page also showed immense weirdness:

Broken SSP User Profiles

I found some articles on similar problems but that did not help.

 

After trying a bunch of things I gave in and thought to my self: f**k it, I'll just create a new SSP and configure search once again. Said and done I started creating a new SSP only to get a failure, but with some interesting info "User cannot be found".

CreateSSPFailure

This set me off in the right direction. After reading trough a enormous amount of ULS logs and a substantial amount of support case logs I finally figured out what caused it all.

 

When you install a new SharePoint farm the account used to do the installation is appointed as the owner of the central admin site collection. We did not use a specific installation account and the guy that did the installation for our farm had left the place a while ago. Hence the IT department locked his account and that is what caused it all.

 

So lessoned learned, always use a specific set up account!

Wednesday, April 09, 2008

Installing MOSS SP1 - The Order of Things

In my current project I last Monday ran in to some DST issues on our MOSS farm. As you can guess we had a DST (daylight saving time) here in Sweden during the weekend just before this very frustrating Monday. I experienced it by getting huge problems while trying to deploy my wsp solution packages. The thing that tipped me of was that went to lunch with a deployment in "Deploying..." and when I got back it succeeded. As you probably figured by now we did not have Service Pack 1 installed.

So I read trough the Planning and Deploying Service Pack 1 for Microsoft Office SharePoint Server 2007 in a Multi-server Environment. It contains a lot of god info, but when it comes to to the installation procedure and sequence of things it does not do a very good job, in fact I find it to be quite self contradictory on some points.

What I ended up doing was my own installation matrix, which you can download here:

Install matrix

Monday, March 24, 2008

Stopwatch class

In my current project we’ve reached a phase where we’re hunting down performance issues and are doing some light profiling of our own on our code. So I’ve been using the Stopwatch class a lot to see how long time things take. This post is more of a note to self, so I can find it again in a non time consuming fashion :).

Friday, March 14, 2008

Net Stop Sens

This post is mostly a note to self, but hopefully it can save a little time for someone. I was just cranking up a new VPC to demo some document management features for a customer. Browsing the SharePoint webs worked like a charm, creating document libraries was no problem. But when I tried to save a document from word 2007 to a document library I got an annoying: "This file cannot be saved to this location because there is no connection to the server. Check your network connection and try again."

WordSaveError 

Also, the document panel failed to load.

I guess is that WebDAV or FrontPage RPC does not work connections the same way http does. I had no clue what this was so I spent a little time with my favorite search engine (live.com of course ;)).

Here is what made the error go away: net stop sens.

NetStopSens 

Apparently the System Event Notification Service (SENS) can be used by applications to determine bandwidth and such. And the office client uses it in some way. I have never encountered this problem in a production environment and even if I did, I don’t think turning off the service is the right solution. But in the case of a demo/development VPC, it’s fine in my book.

Monday, February 11, 2008

Configuration system failed to initialize

Just encountered a problem that’s soooo simple to solve, but yet sooo frustrating when you encounter it.

I was moving some code from my local prototype project up to the official dev environment. At first I forgot some entries in the app.config file. Piece of cake, right?

So off to some cut-n-paste action and I ended up with this:

Bad_Config

 

Looks alright? Well it did to me, so imagine my surprise when I got hit with the following exception:

System.Configuration.ConfigurationErrorsException was unhandled
  Message="Configuration system failed to initialize"
  Source="System.Configuration"
  BareMessage="Configuration system failed to initialize"
  Line=0

I messed around with it for a while thinking I’d introduced some weird character somewhere or something like that. But no sir, that was not the problem. After googling searching at live.com for a while, I found someone saying that the <configSection> element needs to be the first thing in the config-file.

Good_Config

Perhaps this can save someone else a little time and frustration.

Monday, January 07, 2008

Iterating Over and Deleting SharePoint Groups Programmatically

Ever wanted to do something with the groups within SharePoint 2007 programmatically? Here is some sample code you might find useful.

Iterating:

public IList<string> GetAllUserGroups(string url)
{
    IList<string> groupList = new List<string>();

    using (SPSite site = new SPSite(url))
    {
        SPGroupCollection groups = site.RootWeb.SiteGroups;
       
        foreach (SPGroup g in groups)
        {
            groupList.Add(g.Name);
        }
    }

    return groupList;
}

Deleting:

public void DeleteUserGroups(IList<string> GroupNames, string url)
{
    using (SPSite site = new SPSite(url))
    {
        SPGroupCollection groups = site.RootWeb.SiteGroups;

        foreach (string groupname in GroupNames)
        {
            groups.Remove(groupname);
        }
    }
}

That code sure could do with some error handling around the remove statement.

Tuesday, November 20, 2007

FeedBurner Acquired by Google

GoogleAcquiredFeedburner This was news to me, tough with my low reader count I don’t hang around FeedBurner much checking my stats. Google acquired FeedBurner. I don’t know if it’s a good or a bad thing, time will have to show. But I can draw one quick conclusion. Now Google is profiling me even more.

I kinda see where they want to go with this, apart from the profiling part I think Technorati might get some serious competition in the search blogs space.

Read more about it here.

Monday, November 19, 2007

They are here...

VisualStudioLogoWhiteBackground ... Final releases of Visual Studio 2008 and .NET Framework 3.5 that is! What did you think I was talking about?

http://weblogs.asp.net/scottgu/archive/2007/11/19/visual-studio-2008-and-net-3-5-released.aspx

Get your friends face(book)s in Outlook

OutSync_logo I was vaccuming our appartment the other day and to make it less boring I listened to an episode of Hanselminutes podcast. Yeah I know, I got a boring life. Anyway’s in this episode Scott talked to Mel Sempat (PM at Microsoft Live) about programming Facebook apps. Turns out Mel put together a nifty little tool by the name of Outsync.

I don’t know about you, but I’m a Outlook junkie. I have most part of my life there when it comes to calender, contacts, todo’s and so forth. I’m also on Facebook (who isn’t these days). So what this app does is to itterate trough your facebook friends and getting thier faces. Then it goes trough your outlook contacts and match them on names (facebook api’s does not expose e-mail addresses for privacy reasons). Then you can choose whose face you want in outlook.

That’s all neat and good, but the real fun is if you have a windows mobile phone syncing with outlook or exchange. Then that picture starts apperaing all over the place, when that person calls, sms:es and so on.

Thursday, November 15, 2007

Stack Data Structure Vs. Recursion

A while ago I had an epiphany around using a stack data structure instead of recursion. Yeah, I know old news for most of you. But I don’t care, I just got it and it feels good. Anyway’s, for those of you that haven’t had that revelation yet maybe this example can help you there.

Problem

I have a scenario where I have to loop through a directory structure and look for csproj-files. What I do with them is not relevant to this example.

Solution 1 - Recursion

So, being a creature of habit I went for the recursive approach.

class Program
{
    static void Main(string[] args)
    {

        string startAddress = "C:\\SomeDir";
        DirectoryInfo di = new DirectoryInfo(startAddress);

        TraverseForProjectFiles(di);
    }


    private static void TraverseForProjectFiles(DirectoryInfo currentDirectory)
    {
        foreach (FileInfo projFile in currentDirectory.GetFiles("*.csproj"))
        {
            //Do some stuff
            Console.WriteLine("Found file: {0}", projFile.FullName);
        }

        foreach (DirectoryInfo subDirectory in currentDirectory.GetDirectories())
        {
            TraverseForProjectFiles(subDirectory);
        }
    }
}

Solution 2 - Stack Data Structure

When I was done with my little program it hit me, this must be the perfect isolated little sample for trying out the stack approach. The generic stack class is available in the namespace System.Collections.Generic.

class Program
{
    static void Main(string[] args)
    {
        Stack<DirectoryInfo> dirStack = new Stack<DirectoryInfo>();
        string startAddress = "C:\\SomeDir";
        DirectoryInfo di = new DirectoryInfo(startAddress);

        dirStack.Push(di);

        while (dirStack.Count > 0)
        {
            DirectoryInfo currentDirectory = dirStack.Pop();

            foreach (FileInfo projFile in currentDirectory.GetFiles("*.csproj"))
            {
                //Do some stuff
                Console.WriteLine("Found file: {0}", projFile.FullName);
            }

            foreach (DirectoryInfo subDir in currentDirectory.GetDirectories())
            {
                dirStack.Push(subDir);
            }
        }
    }
}

Comparison

So which of the two methods is the best? None or both... Right tool for the right job and all that. Personally I think that if the scenario was a little more complex, which it usually is out there in the real world, the Stack approach has a higher level of readability and probably debugability. But that’s just my opinion. The takeaway from this post is that recursive problems often can be solved using the Stack data structure.

Friday, November 09, 2007

New Search Products Announced

I'm a little late out of the start blocks here. The 6th of november the public announcements came from Microsoft for a couple of new products, namely the Serach Server 2008 family.

The big news her imho is the Express version wich (as the rest of the Express products) is free. This will get a lot of people to try out search without making a very big investment. Meaning you still have to have a server to run it on an the time to set it up.

More on SharePoint Backup/Restore

First off we released a new whitepaper on the subject:

Short, neat title ah... Anyways a really good read where guidance have been somewhat mediocre.

Second, Data Protection Manager (DPM) 2007 is RTM:ed and it has the ability to help protect data on your SharePoint farms.

Sunday, October 07, 2007

Stsadm backup/restore

Nice writeup of Bob Fox on how to use SharePoint’s built in backup tools (stsadm -o backup). He also talks about scheduling your backup jobs and verirying them.

 

 

Copyright © Niklas Nihlen