Singapore public holidays information is available at Ministry of Manpower (MOM) website. Currently we can get them from 3 locations:
- ICS File
http://www.mom.gov.sg/Documents/employment-practices/public-holidays/public_holidays.sg.YYYY.ics
- Web page
http://www.mom.gov.sg/employment-practices/leave-and-holidays/Pages/public-holidays-YYYY.aspx
- Web page beta version
http://www.mom.gov.sg/beta/public-holidays.html
For the sake of brevity, I will only make use the data from the beta version website. If you interested in parsing ICS file, the wheel is already created.
Alright, first we need to add HtmlAgilityPack to our project
PM> Install-Package HtmlAgilityPack
Next, we view the source of http://www.mom.gov.sg/beta/public-holidays.html
<!---- Lots of other stuffs --> <div id=Public-holidays-2013 class=tab> <!---- other stuffs --> <tbody> <tr> <td>1 January 2014</td> <td>Wednesday</td> <td><img src="assets/images/public-holiday/new_year.png" alt="" width=52></td> <td class=cell-holiday-name>New Year’s Day<span class=text-date-mobile>1 January 2014, Wednesday</span></td> </tr> <tr> <td>31 January 2014 <br>1 February 2014</td> <td>Friday <br>Saturday</td> <td><img src="assets/images/public-holiday/cny.png" alt="" width=52></td> <td class=cell-holiday-name>Chinese New Year<span class=text-date-mobile>31 January 2014 - 1 February 2014, Friday - Saturday </span></td> </tr> <!---- Lots of other stuffs -->
From code above we can take note that:
- The public holidays for each year are located under div with id
Public-Holidays-YYYY
- Each public holiday is under tr
- The first td will contain the date, with the exception of Chinese New Year, it will contain two dates
- The last td will contain the public holiday name
With that in mind, we can now configure how our parsing would be. Before that let’s create the class to hold the data:
public class PublicHoliday { public DateTime Date { get; set; } public String Name { get; set; } }
And now let’s construct the method to parse the HTML:
private List<PublicHoliday> RetrievePublicHolidays(DateTime dt, HtmlDocument doc) { var yearID = "Public-holidays-" + dt.Year.ToString(); var result = new List<PublicHoliday>(); foreach (var tr in doc.DocumentNode.SelectNodes("//div[@id='" + yearID + "']/table/tbody/tr")) { DateTime curDt; if (DateTime.TryParse(tr.ChildNodes.First(x => x.Name == "td").InnerText, out curDt)) { var td = tr.ChildNodes.Last(x => x.Name == "td"); var pb = new PublicHoliday{ Date = curDt, Name = td.ChildNodes.First().InnerText }; result.add(pb); } else { //Special treatment for Chinese New Year if (td.InnerHtml.Contains("<br>")) { var lastTd = tr.ChildNodes.Last(x => x.Name == "td"); var Name = lastTd.ChildNodes.First().InnerText; if (DateTime.TryParse(td.ChildNodes.First().InnerText, out curDt)) { var pb = new PublicHoliday { Date = curDt, Name = Remarks }; result.add(pb); } if (DateTime.TryParse(td.ChildNodes.Last().InnerText, out curDt)) { var pb = new PublicHoliday { Date = curDt, Remarks = Remarks }; result.add(pb); } } } } return result; }
Finally, let above method take into action:
var request = WebRequest.Create("http://www.mom.gov.sg/beta/public-holidays.html"); var response = request.GetResponse(); var resStream = response.GetResponseStream(); if (resStream == null) { throw new Exception ("Response Stream is empty"); } var reader = new StreamReader(resStream); var content = reader.ReadToEnd(); reader.Close(); //Now Load the HTML var doc = new HtmlDocument(); doc.LoadHtml(content); //Retrieve last year Public Holidays RetrievePublicHolidays(DateTime.Now.AddYear(-1), doc); //Retrieve this year Public Holidays RetrievePublicHolidays(DateTime.Now, doc); //Retrieve next year Public Holidays RetrievePublicHolidays(DateTime.Now.AddYear(1), doc);
That’s all folks. I hope it helps, cheers!
loading...
About Hardono
Incoming Search
.net, c#, data, singapore